Ilija Bogunovic

Assistant Professor, UCL, UK

Speaker: Ilija Bogunovic (Assistant Professor, UCL, UK) Date: 22-10-2024, 2pm-3pm (BST) Location: Department of Computer Science, CS1.04, University of Warwick, Coventry, UK

Robust and Efficient AI Alignment

Ilija Bogunovic

Abstract

Aligning large language models (LLMs) with human values, ethical principles, and user intentions is critical to building AI systems like ChatGPT that drive transformative, human-centric progress. It is also a key step toward ensuring the sustainable and safe development of artificial general intelligence (AGI) that genuinely benefits humanity. Reinforcement learning from human feedback (RLHF) has become the leading method for fine-tuning LLMs, optimizing their responses by using human preferences to guide their behavior. However, RLHF faces pressing challenges—namely, inefficient data use and an overly simplified approach to the complex, pluralistic nature of human societies. RLHF-tuned models frequently exhibit bias, favoring majority perspectives while overlooking minority voices, highlighting the need for principled algorithms that address these shortcomings. In this talk, we present innovative, efficient, and pluralistic alignment algorithms that outperform standard RLHF methods. Our approach tackles data acquisition inefficiencies, improves bias resilience, and aligns LLMs with a wider range of human perspectives. By integrating RLHF with sequential decision-making frameworks, we significantly boost data efficiency and develop algorithms capable of navigating the complexities of aligning LLMs with diverse societal values. We showcase our algorithms’ effectiveness in fine-tuning popular LLM models.


About Ilija Bogunovic

Ilija Bogunovic is a lecturer at University College London (UCL) since 2022, where he leads a research group focused on the intersection of sequential decision-making and generative models. Prior to joining UCL, Ilija completed his PhD at EPFL and his postdoc at ETH Zurich. His recent achievements include receiving the prestigious EPSRC New Investigator Award and, for the first time at UCL, the Google Research Scholar Award.