The Probability and Statistics seminar is a meeting of the research teams of Prof. Baraud, Prof. Nourdin, Prof. Peccati, Prof. Podolskij and Prof. Thalmaier. Its aim is to present both research works and surveys of mathematical areas of common interest. An archive of talks before 2020 can be seen here.

# Probability & Statistics Seminar

- Home >
- Probability & Statistics Seminar

#### Upcoming sessions

- Thursday 22.10.2020, 1pm, Webex
**Lutz Dümbgen (University of Bern)**,*Shape-Constrained Distributional Regression – Stochastic and Likelihood Ratio Order***Abstract:**We consider nonparametric bivariate regression with generic observations (X,Y). A possible and often natural assumption is that the conditional distribution of Y, given that X = x, is “increasing” in x. A standard notion of “increasing” would be the usual stochastic order, and we present estimators and asymptotic properties for that setting. A stronger notion of order is the likelihood ratio order which is well-known from mathematical statistics and binary classification. We review this property briefly but in full generality and describe its relation to so-called multivariate total positivity of order 2 (MTP2). Then we present an algorithm to estimate the joint distribution of (X,Y) from empirical data under the sole assumption that the conditional distribution of Y, given that X = x, is increasing in x with respect to likelihood ratio order.

This is joint work with Alexandre Mösching (Bern, Göttingen). - Thursday 29.10.2020, 1pm, 3.500
**Martin Wahl (Humboldt-Universität zu Berlin)**,*Upper and lower bounds for the estimation of principal components***Abstract:**In settings where the number of observations is comparable to the dimension, principal component analysis (PCA) reveals some unexpected phenomena, ranging from eigenprojector inconsistency to eigenvalue upward bias. While such high-dimensional phenomena are now well understood in the spiked covariance model, the goal of this talk is to discuss some extensions for the case of PCA in infinite dimensions.

In such scenarios the spiked covariance model becomes less important and typically different eigenvalue decay assumptions are investigated instead. Our main results show that the behavior of eigenvalues and eigenprojectors of empirical covariance operators can be characterized by the so-called “relative ranks”. The proofs rely on a novel perturbation-theoretic framework, combined with concentration inequalities for sub-Gaussian chaoses in Banach spaces.

If time permits, we will also present corresponding minimax lower bounds for the estimation of eigenprojectors. These are obtained by a van Trees (resp. Cramér-Rao) inequality for invariant statistical models. - Thursday 19.11.2020, 1pm, TBA
**Sylvian Arlot (Université Paris-Saclay, Orsay.)**,*TBA***Abstract:**TBA - Thursday 17.12.2020, 2pm, MNO 5A
**Hélène Halconruy (Télécom ParisTech)**,*From Stein’s method to Stein-Malliavin-Dirichlet structures: a discrete point of view.***Abstract:**The Stein’s method is a popular technique used to derive upper-bounds of distances between probability distributions. Its connection with Malliavin calculus via the integration by parts formula of stochastic analysis, as designed by I. Nourdin et G. Peccati, and the use of the tools provided by the underlying Markovian-Dirichlet structure, makes it a powerful instrument to state probabilistic approximations. On another note, the Malliavin calculus, originally developed to provide an infinite-dimensional differential calculus on the Wiener space, was further extended to other processes (Gaussian, Poisson, Rademacher etc.) and has since reached a certain maturity.

As part of a work led under the supervision of Laurent Decreusefond, we offer a discrete version, that is the elaboration of a Malliavin calculus on any denumerable product of probability spaces and we generalize to a certain extent what is known about Rademacher spaces. This construction also agrees with the preexisting theories in continuous time and we retrieve the usual Poisson and Brownian Dirichlet structures associated to their respective gradient, as limits of the structures induced by our formalism.

This talk during which will be presented the components of our discrete Malliavin calculus will be motivated by two convergence results : the approximation of the Normal law on the one hand, of the Gamma law on the other hand, by functionals of independent random variables ; or, in other words, a discrete analogue of the so-called “Stein-Malliavin criterion” established in Gaussian and Poisson spaces. - Thursday 25.02.2021, 1pm, TBA
**Richard Samworth (University of Cambridge)**,*TBA***Abstract:**TBA

#### Past sessions

- Thursday 15.10.2020, 1pm, Webex
**Ismael Castillo (LPSM)**,*Supremum-norm inference with Bayesian CART***Abstract:**This paper affords new insights about Bayesian CART in the context of structured wavelet shrinkage. We show that practically used Bayesian CART priors lead to adaptive rate-minimax posterior concentration in the supremum norm in Gaussian white noise, performing optimally up to a logarithmic factor. To further explore the benefits of structured shrinkage, we propose the g-prior for trees, which departs from the typical wavelet product priors by harnessing correlation induced by the tree topology. Building on supremum norm adaptation, an adaptive non-parametric Bernstein-von Mises theorem for Bayesian CART is derived using multiscale techniques. For the fundamental goal of uncertainty quantification, we construct adaptive confidence bands with uniform coverage for the regression function under self-similarity.

This is joint work with Veronika Rockova (Chicago) - Thursday 08.10.2020, 2pm, MSA 4.530
**Emmanuel Rio (University of Versailles)**,*About the constants in the deviation inequalities for martingales***Abstract:**In this talk, we will give deviations inequalities for martingales or sums of independent random variables. We will start by giving some constants in Fuk-Nagaev type inequalities. Next we will give an other approach, which allows to give more precise results in the case of martingales with finite third moments. Finally we apply estimates of the minimal distances in the central limit theorem to get upper bounds for the tail-quantiles of sums of independent random variables. - Thursday 01.10.2020, 1pm, Webex
**Gérard Biau (Sorbonne Université)**,*Theoretical Insights into Wasserstein GANs***Abstract:**Generative Adversarial Networks (GANs) have been successful in producing outstanding results in areas as diverse as image, video, and text generation. Building on these successes, a large number of empirical studies have validated the benefits of the cousin approach called Wasserstein GANs (WGANs), which brings stabilization in the training process. In the present contribution, we add a new stone to the edifice by proposing some theoretical advances in the properties of WGANs. First, we properly define the architecture of WGANs in the context of integral probability metrics parameterized by neural networks and highlight some of their basic mathematical features. We stress in particular interesting optimization properties arising from the use of a parametric 1-Lipschitz discriminator. Then, in a statistically-driven approach, we study the convergence of empirical WGANs as the sample size tends to infinity, and clarify the adversarial effects of the generator and the discriminator by underlining some trade-off properties. These features are finally illustrated with experiments using both synthetic and real-world datasets. - Thursday 17.09.2020, 2pm, Webex
**Vincent Rivoirard (CEREMADE, Université Paris Dauphine)**,*Nonparametric inference for Hawkes processes***Abstract:**Hawkes processes are widely applied to event-type data with complex dependencies on the past of the process. They are particularly used in seismology, neuroscience, genetics and social network analysis. The goal of this talk is to present recent advances for nonparametric inference for multivariate Hawkes processes. In the fist part of this talk, frequentist estimation of Hawkes parameters by using Lasso-type estimators is described. Then, the Bayesian setting is considered. Concentration rates for the posterior distribution under reasonable assumptions on the prior distribution are established, first for linear multivariate Hawkes models, then for nonlinear ones. We also present a simulation study to illustrate our results and to study empirically the inference on functional connectivity graphs of neurons. - Thursday 10.09.2020, 11am, webex
**Olivier Lopez (Sorbonne Université)**,*Generalized Pareto regression trees applied to cyber-risk analysis***Abstract:**With the rise of the cyber insurance market, there is a need for better quantification of the economic impact of this risk and its rapid evolution. Due to the relatively poor quality and consistency of databases on cyber events, and because of the heterogeneity of cyber claims, evaluating the appropriate premium and/or the required amount of reserves is a difficult task. In this paper, we propose a method based on regression trees to analyze cyber claims to identify criteria for claim classification and evaluation. We particularly focus on severe/extreme claims, by combining a Generalized Pareto modeling—legitimate from Extreme Value Theory—and a regression tree approach. Combined with an evaluation of the frequency, our procedure allows computations of central scenarios and extreme loss quantiles for a cyber portfolio. Finally, the method is illustrated on a public database - Thursday 10.09.2020, 2pm, Webex
**Michel Denuit (Université catholique de Louvain)**,*Risk reduction by conditional mean risk sharing***Abstract:**This talk considers the conditional mean risk allocation for independent but heterogeneous losses that are gathered in an insurance pool, as defined by Denuit and Dhaene (2012, Insurance: Mathematics and Economics). The behavior of individual contributions to total losses is studied when the number of participants to the pool increases. It is shown that enlarging the pool is generally beneficial and that there exists a critical number of participants such that collaborative insurance outperforms commercial one. The linear fair risk allocation approximating the conditional mean risk sharing rule is identified, providing practitioners with a useful simplification applicable within large pools.

This talk is based on several papers co-authored with Christian Robert from the Laboratory in Finance and Insurance (LFA), CREST, ENSAE, Paris. - Wednesday 01.07.2020, 3pm, Webex
**Francesco Grotto (SNS, Pisa)**,*Invariant Measures for 2d Incompressible Fluid Dynamics Models***Abstract:**The Hamiltonian structure of 2d Euler’s equations and its variants allows the formal derivation of invariant measures from conservation laws. Gaussian and Poissonian invariant measures thus obtained pose nontrivial questions, concerning the singular dynamics they induce and the relations between their very different natures. We will give an overview of classical and more recent results on the topic. - Thursday 14.05.2020, 5pm, Webex
**Fei Pu (University of Luxembourg)**,*Spatial limit theorems for stochastic heat equation via Poincare inequality***Abstract:**In this talk, I will present spatial limit theorems for the solution to stochastic heat equations, which include ergodicity, central limit theorem and Poisson limit theorem. The tool to study these properties is Malliavin calculus, in particular, the Poincare inequality. - Friday 20.03.2020, 10am, Webex
**Arturo Jaramillo (University of Luxembourg)**,*Quantitative Erdös-Kac theorem for additive functions, a self-contained probabilistic approach***Abstract:**The talk will have as starting point the classical Erdös-Kac theorem, a result of great importance in probabilistic number theory, which states that the fluctuations of the standardized number of distinct primes of a uniformly chosen number between one and n, are asymptotically Gaussian. Naturally, after the publication of this result, a quantitative version of it was explored by many authors. LeVeque conjectured that the optimal rate of convergence (in the topology of Kolmogorov distance) was of the order . This was subsequently proved by Turan and Rényi by means of a very clever manipulation of the associated characteristic function. Unfortunately, up to this day, all of the approaches for solving LeVeque’s conjecture(in its full generality) rely on highly non-trivial complex analysis tools, whereas the purely probabilistic tools have only been successfully applied for obtaining non-optimal assessments of the aforementioned rate.

In this talk, we present a new perspective to estimate the distance to a Gaussian distribution (with respect to Kolmogorov and Wasserstein metric), for general additive functions applied to a uniformly chosen number between one and n . Our approach is probabilistic and does not rely on prior knowledge of the underlying characteristic function. Our main result is an optimal Berry-Esseen type bound in the Kolmogorov distance and the Wasserstein distance. In the special case where the additive function is taken to be the prime factors counting function (with and without accounting of multiplicities), we also show Poisson approximations with optimal error bounds in the total variational distance. - Thursday 12.03.2020, 2pm, MNO 5A
**Cecile Durot (Université Paris Nanterre)**,*Divide and Conquer methods in monotone regression*

**Abstract:**The divide and conquer principle in studied in the isotonic regression problem, where rates of convergence are slower than the square-root of the sample size, and limit distributions are non-Gaussian. For a fixed model, the pooled estimator obtained by averaging non-standard estimates across mutually exclusive subsamples, outperforms the non-standard monotonicity-constrained (global) estimator based on the entire sample in the sense of point wise estimation. However, this gain in efficiency under a fixed model comes at a price: the pooled estimator’s performance, in a uniform sense over a class of models worsens as the number of subsamples increases, leading to a version of the super-efficiency phenomenon. Then, we build a corrected pooled estimator that does not suffer from the super-efficiency phenomenon and allows for some heterogeneity in data. The new estimator essentially reverses the steps involved in constructing the above pooled estimator: we first smooth (by local averaging) on each subsample, and then isotonize the pooled smoothed data. Joint work with Moulinath Banerjee and Bodhisattva Sen. - Friday 06.03.2020, 10:30am, MNO 5A
**Vlad Margarint (NYU Shanghai)**,*Backward Loewner Differential Equation as a Singular Rough Differential Equation, the welding homeomorphism and new structural information about the SLE traces***Abstract:**In this talk, I will give an overview of the Schramm-Loewner Evolutions (SLE) theory and present new results on this theory based on the analysis of a Singular Differential Equation that appears naturally in this context. This equation appears when extending the conformal maps to the boundary and can be thought of as a singular Rough Differential Equation (RDE), as in Rough Path Theory. In the study of RDEs, questions such as continuity of the solutions, the uniqueness/non-uniqueness of solutions depending on the behavior of parameters of the equation, appear naturally. We adapt these type of questions to the study of the backward Loewner differential equation in the upper half-plane, and the conformal welding homeomorphism. This view will allow us to obtain some new structural and geometric information about the SLE traces in the regime where they have double points.

This first part is a joint work with Dmitry Belyaev and Terry Lyons.

In the second part, I plan to cover the main ideas of an independent project that uses ideas from Quasi-Sure Stochastic Analysis through Aggregation in order to study SLE theory quasi-surely. This quasi-sure study will allow us to overcome some of the difficulties with the previous analysis that I will emphasize throughout the talk. - Thursday 05.03.2020, 1pm, MNO 5A
**Benjamin Arras (Université de Lille)**,*From generalized Mehler semigroups to stability results for Poincaré-type inequalities***Abstract:**In this talk, I will present some recent results around Stein’s method for multivariate stable laws and generalized Mehler semigroup. This is based on joint works with Christian Houdré (GaTech). - Monday 2.03.2020, 2pm-2:45pm, MNO 5A
**Guenter Last (Karlsruhe Institute of Technology)**,*Unbiased embedding of excursions into Brownian motion***Abstract:**: In this talk, we discuss an embedding problem for a two-sided Brownian motion. We consider an excursion event with positive and finite Itô-measure and construct a stopping time such the two-sided Brownian motion centered around splits into three independent pieces: a time reflected Brownian motion on , an excursion distributed according to a conditional Itô law (given ) and a Brownian motion starting after this excursion. The proof relies on Palm theory for random measures and on excursion theory. Therefore we shall begin with a short review of some fundamental facts on invariant balancing transports of random measures. This talk is based on joint work with Wenpin Tang and Hermann Thorisson. - Monday 2.03.2020, 2:50pm-3:35pm, MNO 5A
**D. Yogeshwaran (ISI, Bangalore)**,*Random minimal spanning acycles.***Abstract:**It is well-known that extremal edge-weights on a minimal spanning tree, nearest-neighbour distances and connectivity threshold are inter-related for randomly weighted graphs. In this talk, we shall look at generalization of this result to randomly weighted simplicial complexes. The first part of the talk shall be about defining spanning acycles and establishing it to be a natural topological generalisation of spanning trees. We shall give the Kruskal’s algorithm to generate minimal spanning acycles. As a consequence of the Kruskal’s algorithm, we shall obtain a connection between minimal spanning acycles and persistent homology. We shall explore applications of these results in the context of random d-complexes and in particular, paying attention to extremal face-weights of the minimal spanning acycles on a complete d-complex with i.i.d. face weights. This is a joint work with Primoz Skraba and Gugan Thoppe. Time permitting, I will sketch some on-going work with Primoz Skraba on Euclidean minimal spanning acycles. - Thursday 13.02.2020, 2pm, MNO 5A
**Guillaume Maillard (Université Paris-Sud)**,*Aggregated hold-out***Abstract:**Aggregated hold-out (Agghoo) is a hyperparameter aggregation method which averages learning rules selected by hold-out (i.e cross-validation with 1 split). Theoretical guarantees on Agghoo ensure that one can use it safely: for a convex risk, at worse, Agghoo performs like the hold-out. For the hold-out, oracle inequalities are known for bounded losses, as in binary classification. We show that classical methods can be extended, under appropriate assumptions, to some unbounded risk-minimization problems. In particular, we obtain an oracle inequality in sparse linear regression with Huber loss, without requiring the variable to be bounded or using truncation. To further investigate the effects of aggregation on performance, we conduct some numerical experiments. They show that aggregation brings a significant improvement over the hold-out. Compared to cross-validation, Agghoo appears to perform better when the intrinsic dimension is sufficiently high, and when there are correlations between predictive and noise covariates. - Thursday 06.02.2020, 2pm, MNO 5A
**Richard Nickl (University of Cambridge)**,*On Bayesian solutions of some statistical inverse boundary value problems***Abstract:**We discuss Bayesian inference in a class of statistical non-linear inverse problems arising with partial differential equations (PDEs): The main mathematical idea behind non-invasive tomography methods is related to the fact that observations of boundary values of the solutions of certain PDEs can in certain cases determine the parameters governing the dynamics of the PDE also in the interior of the domain in question. The parameter to data maps in such settings are typically non-linear, as with the Calderon problem (relevant in electric impedance tomography) or with non-Abelian X-ray transforms (relevant in neutron spin tomography). Real world discrete data in such settings carries statistical noise, and Bayesian inversion methodology has been extremely popular in computational and applied mathematics in the last decade after seminal contributions by Andrew Stuart (2010) and others. In this talk we will discuss recent progress which provides rigorous statistical guarantees for such inversion algorithms in the large sample/small noise limit. - Friday, 31.01.2020, 10:35am, MNO 5A
**Xiao Fang (The Chinese University of Hong Kong)**,*Wasserstein-2 bounds in normal approximations under local dependence with applications to strong embeddings***Abstract:**We obtain a general bound for the Wasserstein-2 distance in normal approximation for sums of locally dependent random variables. The proof is based on an asymptotic expansion for expectations of second-order differentiable functions of the sum. We apply the main result to obtain Wasserstein-2 bounds in normal approximation for sums of -dependent random variables, U-statistics and subgraph counts in the Erdös-Rényi random graph. We also discuss an application to strong embeddings. - Thursday 30.01.2020, 2pm, MNO 5A
**Alexandre Moesching (University of Bern)**,*Order Constraints in Nonparametric Regression***Abstract:**Imposing a nonparametric qualitative constraint in a statistical model has shown its benefit on several occasions, for example in circumstances where a parametric model is hard to justify but a qualitative constraint on the distribution is natural. We consider a stochastic ordering constraints on an unknown family of distributions , with a fixed subset , and discuss nonparametric estimation procedures based on a sample such that, conditional on the the are independent random variables with distribution functions .