Istock4

Statistics and Data Science Seminar Series

The Department of Statistics hosts the Statistics and Data Science Seminar Series (SDSS) throughout the year and usually taking place on Friday afternoons at 2pm. Topics include statistics, machine learning, computer science and their interface, both from a theoretical and applied point of view. We invite both internal and external speakers to present their latest cutting edge research. All are welcome to attend our seminars!

Winter Term 2025 

Friday 17 January 2025, 2-3pm - Nicolas Verzelen (INRAE)

nicolas verzelen

The talk will be in COL.1.06.

Title: Computation-information gap in high-dimensional clustering.

Abstract: We investigate the existence of a fundamental computation-information gap for the problem ofclustering a mixture of isotropic Gaussian in the high-dimensional regime, where the ambientdimension p is larger than the number n of points. The existence of a computation-information gap ina specific Bayesian high-dimensional asymptotic regime has been conjectured by Lesieur et al. (2016)based on the replica heuristic from statistical physics. We provide evidence of the existence of such agap generically in the high-dimensional regime p >n, by (i) proving a non-asymptotic low-degreepolynomials computational barrier for clustering in high-dimension, matching the performance of thebest known polynomial time algorithms, and by (ii) establishing that the information barrier forclustering is smaller than the computational barrier, when the number K of clusters is large enough.These results are in contrast with the (moderately) low-dimensional regime n> poly(p,K) where there isno computation-information gap for clustering a mixture of isotropic Gaussian. This is based on a jointwork with Bertrand Even and Christophe Giraud (Paris-Saclay).

Biography: Website 

Friday 24 January 2025, 2-3pm - Peter Orbanz (UCL)

Peter OrbanzTitle: Gaussian and non-Gaussian universality, and applications to data augmentation.

Abstract: The term Gaussian universality refers to a class of results that are, loosely speaking, generalized central limit theorems (where, somewhat confusingly, the limit law is not necessarily Gaussian). They provide useful tools to study certain problems in machine learning. I will give a short overview of this idea and present two types of results: One are upper and lower bounds that map out where Gaussian universality is applicable and what rates of convergence one can expect. The other is the use of these techniques to obtain quantitative results on the effects of data augmentation in machine learning problems.

This is joint work with KH Huang (Gatsby Unit) and M Austern (Harvard).

 Biography: Website 

Friday 31 January 2025, 2-3pm - François-Xavier Briol (UCL)

Francois Xavier BriolTitle: Robust and Conjugate Gaussian Process Regression. 

Abstract: To enable closed form conditioning, a common assumption in Gaussian process (GP) regression is independent and identically distributed Gaussian observation noise. This strong and simplistic assumption is often violated in practice, which leads to unreliable inferences and uncertainty quantification. Unfortunately, existing methods for robustifying GPs break closed-form conditioning, which makes them less attractive to practitioners and significantly more computationally expensive. In this work, we demonstrate how to perform provably robust and conjugate Gaussian process (RCGP) regression at virtually no additional cost using generalised Bayesian inference. RCGP is particularly versatile as it enables exact conjugate closed form updates in all settings where standard GPs admit them. To demonstrate its strong empirical performance, we deploy RCGP for problems ranging from Bayesian optimisation to sparse variational Gaussian processes. 

Biography: Website