Free Reads
Sign in to view your remaining parses.
Tag Filter
Text-to-Video Generation
Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals
Published:10/31/2025
Distribution Matching DistillationMulti-Step Distillation FrameworkEnhancing Generative Model CapacityMixture-of-Experts ArchitectureText-to-Video Generation
The Phased DMD method enhances model capacity in complex generative tasks by improving traditional onestep models. It uses score matching within SNR subintervals, reducing learning difficulty while addressing instability and efficiency issues in multistep distillation.
03
CameraCtrl: Enabling Camera Control for Text-to-Video Generation
Published:4/3/2024
Video Generation ControlCamera Trajectory ParameterizationDiffusion Model Camera ControlText-to-Video GenerationControllable Video Generation
This paper presents CameraCtrl, a method for precise camera pose control in video generation. By utilizing effective camera trajectory parameterization and a plugandplay control module, CameraCtrl enhances user controllability and creative expression without affecting other bas
02
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models
Published:12/2/2024
Customized Motion TransferMultimodal Large Language Modelvideo diffusion modelsMotion ModelingText-to-Video Generation
MoTrans introduces a customized motion transfer method using a multimodal large language model recaptioner and an appearance injection module, effectively transferring specific humancentric motions from reference videos to new contexts, outperforming existing techniques.
06
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Published:8/12/2024
Text-to-Video GenerationDiffusion ModelsDiffusion Transformer3D Variational AutoencoderVideo Generation Quality Improvement
CogVideoX is a largescale texttovideo model using a diffusion transformer that generates 10second videos at 16 fps and 768×1360 resolution. It addresses coherence and semantic alignment issues with methods like 3D VAE and expert transformers, achieving significant quality imp
03
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets
Published:11/26/2023
Diffusion ModelsVideo Generation ModelsText-to-Video GenerationHigh-Quality Video Fine-TuningVideo Dataset Curation
The paper presents Stable Video Diffusion (SVD), a model for highresolution texttovideo and imagetovideo generation. It evaluates a threestage training process and highlights the importance of wellcurated datasets for highquality video generation, demonstrating strong per
04
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Published:3/22/2024
Text-to-Video GenerationLong Video GenerationAutoregressive Video GenerationConditional Attention MechanismVideo Enhancement Application
This paper presents StreamingT2V, an autoregressive method for long video generation, addressing limitations of existing texttovideo models. It utilizes a Conditional Attention Module and Appearance Preservation Module for smooth transitions and scene feature retention, alongsi
04
Phenaki: Variable Length Video Generation From Open Domain Textual Description
Published:10/6/2022
Text-to-Video GenerationLong Video GenerationVariable Length Video GenerationJoint Training on Image-Text PairsTemporal Encoding with Transformer
Phenaki is a model designed for generating variablelength videos from text prompts. It introduces a novel video representation method using causal attention and a bidirectional masked transformer, overcoming challenges of computational cost and data scarcity while improving temp
06