Tags: Linear Attention Mechanism - Paper Library

Bridging the Divide: Reconsidering Softmax and Linear Attention

Published:12/9/2024

Linear Attention MechanismLimitations of Softmax AttentionVision Transformer ArchitectureLong-Range Information ModelingAttention Weight Confusion

The paper compares Softmax and linear attention, revealing that performance gaps stem from linear attention's noninjective nature and lack of local modeling. The findings suggest enhancing linear attention with these traits can outperform Softmax while reducing computational cos

02

Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

Published:8/22/2025

Post Neural Architecture SearchHybrid-Architecture Language ModelsEfficient Generation InferenceLinear Attention MechanismHardware-Aware Hyperparameter Search

JetNemotron uses PostNAS to freeze pretrained MLP weights and optimize attention blocks, creating efficient hybridarchitecture language models that match or surpass accuracy of leading models while boosting generation throughput by up to 53.6×.

04

LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution

Published:10/10/2025

Image Super-resolutionLinear Attention MechanismPerception-Distortion Trade-off OptimizationEarly-Stopping Guided Fine-tuningSNR-based Mixture of Experts

LinearSR enables stable, efficient image superresolution by addressing training instability, perceptiondistortion tradeoffs, and guidance efficiency using novel finetuning, SNRbased experts, and lightweight guidance strategies.

010

LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation

Published:10/9/2025

video diffusion modelsLinear Attention MechanismPost-Training Sparse Attention OptimizationEfficient Video GenerationDistribution Matching Objective

LinVideo is a datafree posttraining framework that selectively replaces selfattention with linear attention in video diffusion models, using anytime distribution matching to maintain performance and achieve up to 15.92× latency reduction and 1.25–2× speedup.

010

Free Reads