Joey Bose
Post-Doctoral Fellow, University of Oxford
Speaker: Joey Bose (University of Oxford, UK) Date: 13-02-2025, 2pm-3pm (BST) Location: Mathematical Sciences Building, MB0.01, University of Warwick, Coventry, UK
Theoretical Foundations Of Self Consuming Generative Models
Abstract
The rapid progress in generative models has resulted in impressive leaps in generation quality, blurring the lines between synthetic and real data. Web-scale datasets are now prone to the inevitable contamination by synthetic data, directly impacting the training of future generated models. In this talk, I will start from the very beginnings on generative models that are trained on their data and prove the conditions for model collapse or stable iterative retraining depending on the fraction of generated data used at each retraining step. In addition, I will marry theory to current practice of iterative retraining after a human in the loop data curation step. For instance, many interfaces of popular text-to-image generative models, such as Stable Diffusion or Midjourney, produce several variations of an image for a given query which can eventually be curated by the users. I will then overview a series of theoretical results that the data curation on iterated retraining of generative models can be seen as an implicit preference optimization mechanism. Finally, I will outline illustrative experiments on both synthetic datasets and on CIFAR10 showing that such a procedure amplifies biases of the reward model.
About Joey Bose
Joey Bose is a Post-Doctoral Fellow at University of Oxford working with Michael Bronstein and an Affiliate member of Mila. He completed his PhD at McGill/Mila under the supervision of Will Hamilton, Gauthier Gidel, and Prakash Panagaden. His research interests span Generative Modelling, Differential Geometry for Machine Learning with a current emphasis on geometric generative models for scientific applications. Previously, he completed his Bachelors and Master’s degrees from the University of Toronto working on adversarial attacks against face detection and is the President and CEO of FaceShield Inc an educational platform for digital privacy for facial data. His work has been featured in Forbes, New York Times, CBC, VentureBeat and other media outlets and was generously supported by the IVADO PhD Fellowship, and NSERC Post-doc Fellowship.