Free Reads
Sign in to view your remaining parses.
Tag Filter
Diffusion Transformer
Vision Bridge Transformer at Scale
Published:11/28/2001
Diffusion TransformerImage and Video Editing TasksLarge-Scale Data ProcessingBridge ModelsInput-to-Output Trajectory Modeling
The Vision Bridge Transformer (ViBT) introduces a largescale implementation of Brownian Bridge Models for efficient conditional generation, enhancing data translation by modeling inputoutput trajectories, achieving robust performance in largescale image and video editing tasks
02
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Published:8/12/2024
Text-to-Video GenerationDiffusion ModelsDiffusion Transformer3D Variational AutoencoderVideo Generation Quality Improvement
CogVideoX is a largescale texttovideo model using a diffusion transformer that generates 10second videos at 16 fps and 768×1360 resolution. It addresses coherence and semantic alignment issues with methods like 3D VAE and expert transformers, achieving significant quality imp
03
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Published:6/25/2024
Diffusion Model QuantizationPost-Training QuantizationDiffusion TransformerDynamic Activation QuantizationImageNet Dataset
The paper introduces QDiT, a method for accurate quantization of Diffusion Transformers (DiTs), addressing spatial and temporal variance in weights and activations. By combining automatic quantization and samplewise dynamic activation quantization, QDiT reduces computational c
03