Free Reads
Sign in to view your remaining parses.
Tag Filter
Direct Preference Optimization
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Published:5/29/2023
Direct Preference OptimizationBehavior Control of Large Language ModelsReinforcement Learning from Human FeedbackPreference AlignmentPolicy Training Optimization
This paper introduces Direct Preference Optimization (DPO) for finetuning large unsupervised language models, leveraging a mapping between reward functions and optimal policies, simplifying the process and improving response quality.
02
Refining Text Generation for Realistic Conversational Recommendation via Direct Preference Optimization
Published:8/27/2025
Conversational Recommender SystemsDirect Preference OptimizationLLM Applications in RecommendationUser Preference ExtractionText Generation Refinement
This paper introduces an improved Conversational Recommender System method using Large Language Models to generate dialogue summaries and recommendations, capturing both explicit and implicit user preferences. Direct Preference Optimization (DPO) is employed to ensure rich conten
02