Free Reads
Sign in to view your remaining parses.
Tag Filter
Distributed Training Optimization
WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training
Published:2/28/2025
Long-Context ModelingLarge Language Model TrainingWeight Pipeline ParallelismDistributed Training OptimizationCommunication Efficiency Enhancement
WeiPipe is a weight pipeline parallelism method that effectively reduces communication costs in large model training by overlapping communication and computation, significantly enhancing scalability and throughput compared to existing methods.
03
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep
Learning
Published:4/16/2021
Extreme-Scale Deep Learning Model TrainingZeRO-Infinity System TechnologyHeterogeneous Computing Across GPU, CPU and NVMeDistributed Training OptimizationFine-Tuning Trillion-Parameter Models
ZeROInfinity leverages GPU, CPU, and NVMe memory to break the GPU memory wall, enabling trillionparameter model training and finetuning without code refactoring, achieving high throughput and superlinear scalability in extremescale deep learning.
03