Reading Market
MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models
Published:3/30/2026
MonitorBench is introduced as a systematic benchmark to assess ChainofThought (CoT) monitorability in large language models, featuring 1,514 test instances and stress tests to quantify variations in monitorability, showing structural reasoning enhances CoT monitorability, with
CREval: An Automated Interpretable Evaluation for Creative Image Manipulation under Complex Instructions
Published:3/27/2026
CREval introduces an automated QAbased evaluation pipeline and CREvalBench benchmark for assessing creative image manipulation under complex instructions, ensuring interpretability and alignment with human judgments, thus addressing existing evaluation challenges.
VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward
Published:3/28/2026
This study introduces VGGRPO, a framework that enhances geometric consistency and camera stability in video generation using 4D latent rewards. By integrating a Latent Geometry Model, it improves dynamic scene generation and eliminates costly VAE decoding, significantly boosting
Colon-Bench: An Agentic Workflow for Scalable Dense Lesion Annotation in Full-Procedure Colonoscopy Videos
Published:3/27/2026
This paper introduces ColonBench, a novel multistage agentic workflow addressing dense lesion annotation in colonoscopy videos, comprising 528 videos and extensive annotations. It rigorously evaluates stateoftheart Multimodal Large Language Models, showing promising results
Falcon Perception
Published:3/29/2026
Falcon Perception is a unified dense Transformer architecture that replaces modular encoderdecoder systems, enhancing perception and task modeling efficiency. It achieves a mask quality of 68.0 MacroF1 on SACo, surpassing existing methods, and introduces the PBench benchmark f
AutoWeather4D: Autonomous Driving Video Weather Conversion via G-Buffer Dual-Pass Editing
Published:3/27/2026
AutoWeather4D is a novel framework for autonomous driving video weather editing that decouples geometry and illumination using a Gbuffer dualpass editing mechanism. It achieves comparable photorealism to generative models while enabling fine control, serving as a practical data
Learn2Fold: Structured Origami Generation with World Model Planning
Published:2/2/2026
Learn2Fold introduces a neurosymbolic framework that models origami generation as conditional program induction on creasepattern graphs. It decouples semantic proposal from physical verification, allowing effective folding sequences to be generated directly from natural languag
CutClaw: Agentic Hours-Long Video Editing via Music Synchronization
Published:3/31/2026
CutClaw is an autonomous multiagent framework that efficiently edits hours of raw video into highquality short videos through music synchronization, employing hierarchical multimodal decomposition and agent collaboration to enhance narrative consistency and aesthetic standards.
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis
Published:3/31/2026
UnifyAgent is a unified multimodal agent designed for worldgrounded image synthesis, addressing the limitations of existing models by dynamically interpreting prompts and retrieving multimodal evidence, significantly improving image generation quality.
GEMS: Agent-Native Multimodal Generation with Memory and Skills
Published:3/30/2026
GEMS is a proposed framework that enhances multimodal generation by integrating a structured multiagent system, persistent memory, and domainspecific skills, significantly improving performance on complex instructions and specialized tasks.
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence
Published:3/30/2026
CARLAAir is an opensource platform that integrates highfidelity urban driving and accurate UAV flight in a single Unreal Engine process, enabling joint modeling of airground agents while maintaining the original interfaces of CARLA and AirSim.
nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation
nnUNet is a selfconfiguring deep learning method for biomedical image segmentation that automates the entire pipeline, outperforming most specialized methods on 23 public datasets, thus lowering technical barriers and enhancing segmentation performance.
Tokenized Heterogeneous Graph Transformer with Enhanced Local and Global Representation Learning
The paper introduces THFormer, a novel tokenized heterogeneous graph transformer that explicitly models local heterogeneity and effectively captures finegrained global information. It employs alignment loss for training stability and outperforms existing models on multiple bench
Heterogeneous Graph Transformer with Poly-Tokenization
This study introduces the Polytokenized Heterogeneous Graph Transformer (PHGT), enhancing heterogeneous graph modeling by integrating semantic and global tokens, effectively addressing the limitations of traditional graph neural networks in capturing semantics and longrange dep
MWFNet: A multi-level wavelet fusion network for hippocampal subfield segmentation
MWFNet is introduced as a novel deep learning model for automatic segmentation of hippocampal subfields in MRI. It utilizes multilevel wavelet transforms and multiscale attention mechanisms to address challenges such as small sizes and unclear boundaries, outperforming existing
Model Checking Guided Incremental Testing for Distributed Systems
The paper introduces iMocket, an incremental testing method that reduces costs associated with model checking guided testing of evolving distributed systems, resulting in an average decrease of 74.83% in test cases and a reduction of 22.54% to 99.99% in testing time.
TrafficFormer: An Efficient Pre-trained Model for Traffic Data
TrafficFormer is an efficient pretrained model for traffic data that enhances analysis accuracy. It employs a finegrained multiclassification task during pretraining and a random initialization feature for data augmentation, achieving a 10% improvement in F1 score across six
ProPhy: Progressive Physical Alignment for Dynamic World Simulation
Published:12/5/2025
ProPhy introduces a framework to enhance physical consistency in video generation models. It employs a twostage mechanism for extracting finegrained physical priors, significantly improving the realism of generated videos, particularly in complex dynamics.
SecHeadset: A Practical Privacy Protection System for Real-time Voice Communication
SecHeadset is a usercontrolled privacy protection system for realtime voice communication. It utilizes voice obfuscation to mask original speech and enables secure information exchange, demonstrated to effectively reduce voice recognition accuracy while maintaining communicatio
Large Language Models on Graphs: A Comprehensive Survey
This paper provides a systematic survey on the applications of large language models (LLMs) on graphs, categorizing them into pure graphs, textattributed graphs, and textpaired graphs, while discussing techniques and their advantages, offering a framework for future research.
…