Papers

Sign in to view your remaining parses.
Tag Filter
Towards Expert-Level Medical Question Answering with Large Language Models
Published:5/17/2023
Large Language Models in Medical Question AnsweringMed-PaLM 2Medical Domain Fine-TuningMedQA DatasetMedical Knowledge Retrieval
This paper presents MedPaLM 2, a significant advancement in medical question answering, achieving an 86.5% score on the MedQA dataset, improving by over 19% while employing base LLM enhancements, domain finetuning, and innovative prompting strategies.
01
Memory Forcing: Spatio-Temporal Memory for Consistent Scene Generation on Minecraft
Published:10/4/2025
Autoregressive Video Diffusion ModelsMinecraft Scene GenerationSpatio-Temporal Memory FrameworkGeometry-Indexed Spatial MemoryIncremental 3D Reconstruction
The paper introduces the 'Memory Forcing' framework, which combines spatiotemporal memory for consistent scene generation in Minecraft. It features hybrid training and chained forward training to guide the model in utilizing temporal memory during exploration and spatial memory
02
WorldPack: Compressed Memory Improves Spatial Consistency in Video World Modeling
Published:12/2/2025
Video World ModelsLong-Term Sequential ModelingCompressed Memory MechanismMinecraft LoopNav BenchmarkSpatial Consistency Improvement
WorldPack is a video world model that uses compressed memory to enhance spatial consistency and fidelity in longterm generation, outperforming stateoftheart models in the LoopNav benchmark within Minecraft.
02
TeleWorld: Towards Dynamic Multimodal Synthesis with a 4D World Model
Published:1/1/2026
4D World ModelDynamic Multimodal SynthesisVideo Generation and ReconstructionLong-Horizon Consistency ModelingAutoregressive Diffusion Video Model
The paper presents , a realtime multimodal 4D world model that addresses limitations in video generation by integrating video synthesis and dynamic scene reconstruction within a closedloop framework, utilizing a novel generationreconstructionguidance paradigm for c
02
SANet: Multi-Scale Dynamic Aggregation for Chinese Handwriting Recognition
Published:9/15/2025
Chinese Handwriting RecognitionMulti-Scale Dynamic AggregationStar Attention-based NetworkFeature Extraction and GeneralizationCASIA-HWDB Dataset
This paper introduces SANet, a Star Attentionbased Network utilizing MultiScale Dynamic Aggregation for Chinese handwriting recognition, achieving 98.12% characterlevel accuracy on CASIAHWDB with improved feature extraction and robustness through a lightweight design and synt
01
MultiRAG: A Knowledge-guided Framework for Mitigating Hallucination in Multi-source Retrieval Augmented Generation
Published:8/5/2025
Multi-Source Retrieval-Augmented GenerationKnowledge-Guided ApproachHallucination MitigationLogical Relationship Graph ConstructionMulti-Level Confidence Calculation Mechanism
MultiRAG is a knowledgeguided framework designed to mitigate hallucination in multisource retrievalaugmented generation. By constructing logical relationships with multisource line graphs and a multilevel confidence mechanism, it effectively reduces challenges related to inf
00
ClosureX: Compiler Support for Correct Persistent Fuzzing
Published:2/6/2025
Persistent FuzzingCompiler SupportSoftware Testing Techniques
ClosureX introduces a novel fuzz testing mechanism that addresses semantic inconsistencies in persistent fuzzing. It achieves nearpersistent performance with finegrained state restoration, increasing testcase execution rates by over 3.5 times while enhancing bug discovery capab
01
Finite Scalar Quantization: VQ-VAE Made Simple
Published:9/27/2023
Finite Scalar QuantizationSimplified VQ-VAE MethodAutoregressive Image GenerationMultimodal GenerationDepth Estimation and Image Classification
This paper introduces Finite Scalar Quantization (FSQ) as a simpler alternative to VQ in VQVAEs, enabling implicit codebook creation. FSQ achieves competitive performance across tasks while avoiding codebook collapse and reducing complexity.
02
Rock Classification Based on Residual Networks
Published:2/19/2024
Residual NetworksRock ClassificationData Augmentation MethodsMulti-Head Self AttentionBottleneck Transformer Block Influence
This study proposes two methods for rock classification using residual networks, achieving 70.1% and 73.7% accuracy with modifications to ResNet34 and multihead selfattention. It also explores the impact of bottleneck transformer blocks on performance.
01
ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills
Published:2/3/2025
Alignment of Simulation and Real-World PhysicsHumanoid Whole-Body Skill LearningDelta Action Compensation ModelRetargeted Human Motion DataDynamic Transfer Evaluation
The ASAP framework addresses the dynamics mismatch for humanoid robots by utilizing a twostage approach: pretraining motion tracking policies in simulation and finetuning them with realworld data to achieve agile wholebody skills.
02
Animation Engine for Believable Interactive User-Interface Robots
Published:4/1/2005
Animation Engine for User-Interface RobotsInteractive Robot Behavior SynthesisFamily Companion RobotSmooth Transition FilterAnimation and Expression Synthesis Techniques
This paper presents an animation engine for interactive userinterface robots, integrating believable behaviors with animations through three software components. A case study showcases its use in the family companion robot iCat, enhancing user interaction.
01
Rock Classification through Knowledge-Enhanced Deep Learning: A Hybrid Mineral-Based Approach
Published:10/16/2025
Knowledge-Enhanced Rock ClassificationMineral Composition Analysis1D Convolutional Neural Network ApplicationDeep Learning in GeologyRock Type Identification
This study introduces a knowledgeenhanced deep learning approach for rock classification, integrating geological expertise with spectral analysis. Using 1DCNN, accuracy rates reached 98.37% and 97.75%. Results highlighted optimal limestone classification, revealing challenges f
02
ReHyAt: Recurrent Hybrid Attention for Video Diffusion Transformers
Published:1/8/2026
video diffusion modelsTransformer architectureHybrid Attention MechanismEfficient Attention MechanismVideo Generation
ReHyAt introduces a Recurrent Hybrid Attention mechanism for video diffusion transformers that reduces attention complexity to linear, enhancing scalability for long sequences. It achieves efficient distillation from existing models at significantly lower training costs, while ma
03
CreativeVR: Diffusion-Prior-Guided Approach for Structure and Motion Restoration in Generative and Real Videos
Published:12/13/2025
Diffusion Model Video RestorationVideo Super-Resolution and RestorationStructure and Motion RestorationAIGC Video ProcessingTemporal Coherence Module
CreativeVR is a diffusionpriorguided video restoration framework addressing structural and temporal artifacts in both generative and real videos. Utilizing a deepadapter approach, it offers flexible precision control, balancing restoration quality and corrective behavior, sign
02
MoMa: Skinned motion retargeting using masked pose modeling
Published:9/14/2024
Shape-aware Motion RetargetingSkeleton-aware Motion RetargetingTransformer-based Auto-EncoderMotion TransferMixamo Dataset
MoMa introduces a novel skinned motion retargeting method that integrates skeletonaware and shapeaware capabilities, effectively transferring animations across characters with different structures using a transformerbased autoencoder and a facebased optimizer.
02
Self-Adapting Improvement Loops for Robotic Learning
Published:6/7/2025
Self-Adapting Improvement LoopOnline Video LearningRobotic Task PlanningMetaWorld TasksSelf-collected Behavior Enhancement
This paper introduces the SelfAdapting Improvement Loop (SAIL) that enhances robotic agents' performance on new tasks through selfcollected online experiences. It leverages indomain and internetscale pretrained video models, showing continuous performance improvements over it
02
3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model
Published:5/29/2025
Spatial-Temporal Memory for Large Language Models3DMem-Bench BenchmarkDynamic Memory Management and Fusion ModelEmbodied Tasks in Multi-Room 3D EnvironmentsLong-Term Memory Reasoning
This study introduces 3DLLMMem to enhance longterm spatialtemporal memory in Large Language Models for dynamic 3D environments. It presents 3DMemBench for evaluating reasoning capabilities, with experimental results showing significant performance improvements in embodied tas
01
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Published:6/2/2001
Vision-Language-Action ModelEfficient Robotics ModelCompact Model DesignConsumer-Grade Hardware Deployment
SmolVLA is a compact and efficient visionlanguageaction model that achieves competitive performance at reduced computational costs, enabling deployment on consumergrade hardware and promoting broader participation in robotics research through communitydriven dataset pretraini
01
Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Published:1/11/2023
Knowledge Editing in Language ModelsRepresentation Denoising and Causal TracingIn-Parameter Editing of ModelsFact Storage and Parameter LocalizationUnderstanding Mechanisms in Language Models
This study examines the relationship between knowledge localization and model editing in language models. It reveals that optimal editing locations may differ from suggested knowledge storage points, challenging prior causal tracing assumptions. Ultimately, the choice of editing
01
ALPAGASUS: TRAINING A BETTER ALPACA WITH FEWER DATA
Published:7/17/2023
Data Selection Strategy Based on Large Language ModelsLarge Language Model Fine-TuningHigh-Quality Data FilteringEnhanced Instruction Fine-Tuning CapabilityALPAGASUS Model
The ALPAGASUS model improves performance by using a data selection strategy that filters 9,000 highquality samples from the original 52,000. It outperforms the original Alpaca and achieves 5.7 times faster training, highlighting the importance of data quality over quantity.
01