Dr. Chengchun Shi
London School of Economics and Political Science (LSE)
Speaker: Dr. Chengchun Chi Date: 20-05-2025 2pm-3pm (BST) Location: Mathematical Science Building, MB0.08, University of Warwick, Coventry, UK
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Abstract
Reinforcement learning from human feedback (RLHF) has emerged as a key technique for aligning the output of large language models (LLMs) with human preferences. To learn the reward function, most existing RLHF algorithms use the Bradley-Terry model, which relies on assumptions about human preferences that may not reflect the complexity and variability of real-world judgments. In this paper, we propose a robust algorithm to enhance the performance of existing approaches under such reward model misspecifications. Theoretically, our algorithm reduces the variance of reward and policy estimators, leading to improved regret bounds. Empirical evaluations on LLM benchmark datasets demonstrate that the proposed algorithm consistently outperforms existing methods, with 77-81\% of responses being favored over baselines on the Anthropic Helpful and Harmless dataset.
About Dr. Chengchun Shi
Chengchun is an Associate Professor in the Department of Statistics at LSE. He works at the interface of RL, LLMs and statistics, with applications to ride-sharing and healthcare. His work brings to light the relevance and significance of statistical learning in RL, and demonstrates the usefulness of RL as a framework for policy evaluation and A/B testing in two-sided marketplaces. Chengchun has published approximately 70 papers, with half of them accepted in prestigious statistical journals (JRSSB, JASA, AoS) and top machine learning venues (NeurIPS, ICML, KDD, JMLR). His outstanding contributions have been recognized with esteemed awards such as the Peter Gavin Hall IMS Early Career Prize, IMS Tweedie Award and the Royal Statistical Society Research Prize. He is serving as the associate editors of prestigious journals JRSSB, JASA and AoAS. He also serves as a reviewer for a range of machine learning conferences, including NeurIPS, ICML, ICLR, AAAI, AISTATS, and KDD.