- Thursday 08.06.2023, 1pm, MNO 1020
Laura Freijeiro Gonzales (University of Santiago de Compostela ), Covariates selection procedures for regression models in high dimensionAbstract: In a regression model, covariates selection techniques are of particular interest when there is a high number of explanatory covariates. These allow both simultaneously, reduce the problem dimensionality and select only the relevant terms. Nevertheless, usual procedures may require some model assumption and start to perform poorly for a large value of p. Especially, in the high-dimensional situation of p>n. To solve these drawbacks, penalty techniques or independence tests using novel distance-based dependency measures can be applied. In the first case, procedures such as the well-known LASSO regression can be employed to select covariates and estimate the regression model simultaneously. Conversely, if one does not want to assume some model structure, one can resort to the distance covariance measure of dependence (DC). This coefficient applies independence tests to detect relevant terms as a preliminary step to estimation. In this talk, we will discuss both points of view, comparing the LASSO and DC procedures with some interesting modifications and alternatives. Eventually, we will illustrate the performance of these methodologies for real data problems in high dimensions.
Academic Year 2022-2023
- Home >
- Academic Year 2022-2023
- Thursday 11.05.2023, 1pm, MNO 1020
Kelly Van Lancker (Ghent University), Ensuring valid inference for conditional causal hazard ratios after variable selection
Abstract:The analysis of randomized trials with time-to-event endpoints is nearly always plagued by the problem of censoring. As the censoring mechanism is usually unknown, analyses typically employ the assumption of non-informative censoring. While this assumption usually becomes more plausible as more baseline covariates are being adjusted for, such adjustment also raises concerns. Pre-specification of which covariates will be adjusted for (and how) is difficult, thus prompting the use of data-driven variable selection procedures, which may impede valid inferences to be drawn. The adjustment for covariates moreover adds concerns about model misspecification, and the fact that each change in adjustment set, also changes the censoring assumption and the treatment effect estimand. In the first half of my talk, I will discuss these concerns and propose a simple variable selection strategy that aims to produce a valid test of the null in large samples.In the second part of the talk, I will consider the problem of how to best select variables for inferring the conditional causal hazard ratio with observational data. The major complication that we face with survival data is that the variables that should be selected to adjust for confounding may not be those that explain the censoring mechanism. We overcome this problem using a novel three-step procedure that can be implemented using existing software for penalized Cox regression. In particular, we will propose tests of the null hypothesis that the exposure has no effect on the considered survival endpoint, which are uniformly valid under specific sparsity conditions, along with corresponding uniformly valid confidence intervals. Such uniform validity will be achieved by relying on specific sparse estimators that target the regularization bias inherited by a naïve post-selection partial likelihood estimator.
Simulation results show that the proposed methods yield valid inferences even when the number of covariates exceeds the sample size.
- Thursday 27.04.2023, 1pm, MNO 1040
Rainer von Sachs (UCLouvain), Statistical inference for intrinsic wavelet estimators of covariance matrices in a log-Euclidean manifoldAbstract : In this talk we treat statistical inference for an intrinsic wavelet estimator of curves of symmetric positive definite (SPD) matrices in a log-Euclidean manifold. Examples for these arise in Diffusion Tensor Imaging or related medical imaging problems as well as in computer vision and for neuroscience problems.
Our proposed wavelet (kernel) estimator preserves positive-definiteness and enjoys permutation-equivariance, which is particularly relevant for covariance matrices. Our second-generation wavelet estimator is based on average-interpolation and allows the same powerful properties, including fast algorithms, known from nonparametric curve estimation with wavelets in standard Euclidean set-ups.
At the heart of this talk is the proposition of confidence sets based on our wavelet estimator in a non-Euclidean geometry. We derive asymptotic normality of this estimator, including explicit expressions of its asymptotic variance. This opens the door for constructing asymptotic confidence regions which we compare with our proposed bootstrap scheme for inference. Detailed numerical simulations confirm the appropriateness of our suggested inference schemes. Finally, time permitting, first empirical results for more adaptive non-linear threshold estimates will be discussed, too.
- Thursday 13.04.2023, 1pm, MNO 1020
Yvik Swan (Université Libre de Bruxelles), On the distance between discrete and continuous random variables on the real line
Abstract: We revisit Stein’s method of approximate computation of expectations, here with the aim of comparing a discrete distribution on some ordered set {x_1,…, x_n} ⊂ R and a continuous distribution on some interval I⊂ R. We obtain abstract bounds on Wasserstein and Kolmogorov distances which we apply to several examples, including (1) exponential approximation of the spectrum of the Johnson graph, (2) beta approximation of the Polya-Eggenberger distribution, and (3) normal approximation of a distribution arising in the many worlds interpretation of quantum mechanics. This is joint work with Gilles Germain (ULB).
- Thursday 06.04.2023, 1pm, MNO 1040
Cecilia Mancini (University of Verone), Drift burst test statistic in the presence of infinite variation jumpsAbstract:We consider the test statistic devised by Christensen, Oomen and Renò in 2020 to obtain insight into the causes of {\em flash crashes} occurring at particular moments in time in the price of a financial asset. Under an Ito semimartingale model containing a drift component, a Brownian component and finite variation jumps, it is possible to identify when the cause is a drift burst (the statistic explodes) or otherwise (the statistic is asymptotically Gaussian). We complete the investigation showing how infinite variation jumps contribute asymptotically.The result is that the jumps never cause the explosion of the statistic. Specifically, when there are no bursts, the statistic diverges only if the Brownian component is absent, the jumps have finite variation and the drift is non-zero. In this case the triggering is precisely the drift.We also find that the statistic could be adopted for a variety of tests useful for investigating the nature of the data generating process, given discrete observations.
- Thursday 23.03.2023, 1pm, MNO 1040
Germain Van Bever (University of Namur), Additive regression with general imperfect variablesAbstract: In this talk, we present an additive model where the response variable is Hilbert-space-valued and predictors are multivariate Euclidean, and both are possibly imperfectly observed. Considering Hilbert-space-valued responses allows to cover Euclidean, compositional, functional and density-valued variables. By treating imperfect responses, we can cover functional variables taking values in a Riemannian manifold and the case where only a random sample from a density-valued response is available. This treatment can also be applied in semiparametric regression. Dealing with imperfect predictors allows us to cover various principal component and singular component scores obtained from Hilbert-space-valued variables. For the estimation of the additive model having such variables, we use the smooth backfitting method originated by Mammen et al. (1999). We provide full non-asymptotic and asymptotic properties of our regression estimator and present its wide applications via several simulation studies and real data applications.
- Thursday 23.02.2023, 1pm, MNO 1020
Gonçalo dos Reis (University of Edinburgh), High order splitting methods for stochastic differential equationsAbstract:In this talk, we will discuss how ideas from rough path theory can be leveraged to develop high order numerical methods for SDEs. To motivate our approach, we consider what happens when the Brownian motion driving an SDE is replaced by a piecewise linear path. We show that this procedure transforms the SDE into a sequence of ODEs – which can then be discretized using an appropriate ODE solver. Moreover, to achieve a high accuracy, we construct these piecewise linear paths to match certain “iterated” integrals of the Brownian motion. At the same time, the ODE sequences obtained from this path-based approach can be interpreted as a splitting method, which neatly connects our work to the existing literature. For example, we show that the well-known Strang splitting falls under this framework and can be modified to give an improved convergence rate. We will conclude the talk with a couple of examples, demonstrating the flexibility and convergence properties of our methodology.(This joint work with James Foster and Calum Strange, https://arxiv.org/abs/2210.17543)
- Thursday 02.02.2023, 1pm, MNO 1040
Ingrid Van Keilegom (KU Leuven), Dependent censoring based on parametric copulasAbstract: Consider a survival time T that is subject to random right censoring, and suppose that T is stochastically dependent on the censoring time C. We are interested in the marginal distribution of T. This situation is often encountered in practice. Consider for instance the case where T is the time to death of a patient suffering from a certain disease. Then, the censoring time C is for instance the time until the person leaves the study or the time until he/she dies from another disease. If the reason for leaving the study is related to the health condition of the patient or if he/she dies from a disease that has similar risk factors as the disease of interest, then T and C are likely dependent. In this paper we propose a new model that takes this dependence into account. The model is based on a parametric copula for the relationship between T and C, and on parametric marginal distributions for T and C. Unlike most other papers in the literature, we do not assume that the parameter defining the copula is known. We give sufficient conditions on these parametric copula and marginals under which the bivariate distribution of (T,C) is identified. These sufficient conditions are then checked for a wide range of common copulas and marginals. We also study the estimation of the model, and carry out extensive simulations and the analysis of data on pancreas cancer to illustrate the proposed model and estimation procedure. At the end of the talk a number of extensions (to covariates, left truncation, semiparametric models, confounding factors, …) will be discussed.
- Thursday12.01.2023, 1pm, MNO 1.040
Solesne Bourgin (Boston University), Quantitative fluctuation analysis of multiscale dynamical systemsAbstract: In this talk, we consider multiscale dynamical systems perturbed by a small Brownian noise and study the limiting behavior of the fluctuations around their deterministic limit from a quantitative standpoint. Using a second order Poincare inequality based on Malliavin calculus, we obtain rates of convergence for the central limit theorem satisfied by the slow component in the Wasserstein metric. This is joint work with K. Spiliopoulos
- Thursday 08.12.2022, 1pm, MNO 1.040
Robert Gaunt (University of Manchester),Normal approximation of the posterior in exponential familiesAbstract: The Bernstein-von Mises Theorem is a cornerstone in Bayesian statistics. Loosely put, this theorem reconciles Bayesian and frequentist large sample theory by guaranteeing that, under regularity conditions, suitable scalings of posterior distributions are asymptotically normal. In particular, this implies that the contribution of the prior vanishes in the asymptotic posterior.
In this talk, we demonstrate how the probabilistic technique Stein’s method can be used to derive explicit optimal order total variation and Wasserstein distance bounds to quantify this distributional approximation for posterior distributions in exponential family models. We apply our general bounds to some classical conjugate prior models and observe that the resulting bounds have an explicit dependence on the prior distribution and on sufficient statistics of the data from the sample, and thus provide insight into how these factors may affect the quality of the normal approximation.This is joint work with Adrian Fischer, Gesine Reinert and Yvik Swan.
- Thursday 01.12.2022, 1pm, MNO 1.040
Michele Stecconi (University of Luxembourg),Density of a random submanifold: the zonoid sectionAbstract: I will present a recent omonimous paper, a joint work with Léo Mathis.
Given a “nice” smooth random field, we define a family of convex bodies, one for each point of the manifold.
We show that the Kac-Rice density of the expected volume of the zero set equals, at each point, the first intrinsic volume of the convex body and that the intersection of the zero sets of independent fields corresponds to the wedge product of the corresponding convex bodies. This product structure is a recently developed concept, which makes sense within a special class of convex bodies: zonoids. At the same time, the center of the convex body gives the expected current of integration over the zero set.
-
Thursday 24.11.2022, 1pm, MNO 1.040
Francesco Grotto (University of Pisa), Random Waves on the Hyperbolic SpaceAbstract: The spectral theory of Laplace operator on hyperbolic space shares exhibits close analogieswith the one on Euclidean space: this allows to introduce random wave models by means of a Central Limit theorem for random superpositions of generalized eigenfunctions at a given frequency, just as for Berry’s Euclidean random wave model.
Asymptotic behavior of hyperbolic random waves in high-frequency (both in a fixed domain or locally around a given point), or on large domains, presents similarities and differences with respect to well-established models, such Berry’s one or random spherical harmonics. We will present such a comparison considering Wiener chaos decompositions of geometric functionals of random waves in different geometries.
(Joint work with Giovanni Peccati)
- Thursday 17.11.2022, 1pm, MNO 1.040 Pierre Alquier (RIKEN AIP),The nice properties of MMD for statistical estimation Abstract: Maximum likelihood estimation (MLE) enjoys strong optimality property for statistical estimation, under strong assumptions. However, when these assumptions are not satisfied, MLE is no longer optimal, and sometimes it is totally catastrophic. In this talk, we will explore alternative estimators based on the minimization of well chosen distances. In particular, we will see that the Maximum Mean Discrepancy (MMD) leads to estimation procedures that are consistent without any assumption on the model nor on the true distribution of the data. In practice, this leads to very strong robustness properties. In a second time, we will focus on Bayesian-type estimation. ABC (for Approximate Bayesian Computation) is a popular algorithm for computation of approximation of the posterior distribution. However, it relies on the choice of a so-called “summary statistics”, that is not always clear in practice. Here again, we will show that the construction of a summary statistics based on MMD leads to an approximation of the posterior that is actually far more robust than the actual posterior.
- Thursday 10.11.2022, 1pm, MNO 1.040
Taras Bodnar (Stockholm University)
High-dimensional portfolio selection: Theory and practiceAbstract: Optimal asset allocation is considered in a high-dimensional asymptotic regime, namely when the number of assets and the sample size tend to infinity at the same rate. Due to the curse of dimensionality in the parameter estimation process, asset allocation for such portfolios becomes a challenging task. Using the techniques from random matrix theory, new inferential procedures based on the optimal shrinkage intensity for testing the mean-variance efficiency of a high-dimensional portfolio are developed and the asymptotic distributions of the proposed test statistics are derived. In extensive simulations, we show that the suggested tests have excellent performance characteristics for various values of concentration ratio. The practical advantage of the proposed procedures are demonstrated in an empirical study based on stocks included into the S&P 500 index. We found that there are periods of time where one can clearly reject the null hypothesis of mean-variance optimality of the equally weighted portfolio. Moreover, the mean-variance portfolio outperforms the equally weighted portfolio in these periods.
- Thursday 27.10.2022, 1pm, MNO 1.040
Jean-Michel Poggi (Paris-Cité University and Paris-Saclay University),
Random Forests: Introduction and industrial applicationsAbstract: Random forests (RF) are a statistical learning method extensively used in many fields of application, thanks to its excellent predictive performance. RF are part of the family of tree-based methods and inherit intrinsic flexibility of trees: adapted to both classification and regression problems, easily extended to many different types of data.We will first focus on an introduction of RF, the definition of the variable importance measures and the related variable selection capability. We then illustrate their practical power in two different applied contexts. The first one relates to physiological signal processing and addresses the functional variable selection for driver’s stress level classification. The second is about the aggregation of multi-scale experts for bottom-up electricity load forecasting.
References
-
- Genuer, Poggi, Random Forests with R, 98 p., Use’R!, Springer, 2020
- Genuer, Poggi, Tuleau, Variable selection using Random Forests, Pattern Recognition Letters, 31(14), p. 2225-2236, 2010
- El Haouij, Poggi, Ghozi, Sevestre Ghalila, Jaïdane, Random Forest-Based Approach for Physiological Functional Variable Selection for Driver’s Stress Level Classification, Statistical Methods & Applications, 1-29, 2018
- Goehry, Goude, Massart, Poggi, Aggregation of Multi-scale Experts for Bottom-up Load Forecasting, IEEE Transactions on Smart Grid, vol. 11, 3, 1895-1904, 2020.
- Thursday 20.10.2022, 1pm, MNO 1.040
Yassine Nachit (Cadi Ayyad University),
Local times for systems of non-linear stochastic heat equationsAbstract: We consider $u(t,x)=(u_1(t,x),\cdots,u_d(t,x))$ the solution to a system of non-linear stochastic heat equations in spatial dimension one driven by a $d$-dimensional space-time white noise. We prove that, when $d\leq 3$, the local time $L(\xi,t)$ of $\{u(t,x)\,,\;t\in[0,T]\}$ exists and $L(\bullet,t) $ belongs a.s. to the Sobolev space $ H^{\alpha}(\R^d)$ for $\alpha<\frac{4-d}{2}$, and when $d\geq 4$, the local time does not exist. We also show joint continuity and establish H\”{o}lder conditions for the local time of $\{u(t,x)\,,\;t\in[0,T]\}$. These results are then used to investigate the irregularity of the coordinate functions of ${u(t,x)\,,\;t\in[0,T]\}$. Comparing to similar results obtained for the linear stochastic heat equation (i.e., the solution is Gaussian), we believe that our results are sharp. Finally, we get a sharp estimate for the partial derivatives of the joint density of $(u(t_1,x)-u(t_0,x),\cdots,u(t_n,x)-u(t_{n-1},x))$, which is a new result and of independent interest.
- Thursday 06.10.2022, 1pm, MNO 1.020
Alexandre Tsybakov (CREST – ENSAE)
Statistical decision for variable selectionAbstract: For the core variable selection problem under the Hamming loss, we derive a non-asymptotic exact minimax selector over the class of all s-sparse vectors, which is also the Bayes selector with respect to the uniform prior. While this optimal selector is, in general, not realizable in polynomial time, we show that its tractable counterpart (the scan selector) attains the minimax expected Hamming risk to within factor 2 and moreover is exact minimax under the probability of wrong recovery criterion. In the monotone likelihood ratio framework, we establish explicit lower bounds on the minimax risk and provide its tight characterization in terms of the best separable selector risk. As a consequence, we obtain sharp necessary and sufficient conditions of exact and almost full recovery in the location model with light tail distributions and in the problem of group variable selection under Gaussian noise. The talk is based on a joint work with Cristina Butucea, Enno Mammen and Simo Ndaoud.
- Thursday 29.09.2022, 1pm, MNO 1.020
Sébastien Darses (Aix-Marseille University)
On probabilistic generalizations of the Nyman-Beurling criterion and new closed-form identities involving the Riemann Zeta functionAbstract: One of the seemingly innocent reformulations of the terrifying Riemann Hypothesis (RH) is the Nyman-Beurling criterion: The indicator function of (0,1) can be linearly approximated in L^2 by dilations of the fractional part function. Randomizing these dilations generates new structures and criteria for RH, regularizing very intricate ones. One other possible nice feature is to consider polynomials instead of Dirichlet polynomials for the approximations. The Zeta function is then encoded in this last framework as a weighted density measure on the critical line. We prove closed form identities for the involved moments (determinate Hamburger problem, and thus a full characterization).
The talk will be very accessible, especially for graduate students and a quick review on the Zeta function will be given.
Joint work with F. Alouges and E. Hillion.
- Thursday 22.09.2022, 1pm, MNO 1.020
Nikolai Leonenko (Cardiff University)
Sojourn functionals for spatiotemporal Gaussian random fields with long-memoryAbstract: The paper [3] addresses the asymptotic analysis of sojourn functionals of spatiotemporal Gaussian random fields with long-range dependence (LRD) in time also known as long memory. Specifically, reduction theorems are derived for local functionals of nonlinear transformation of such fields, with Hermite rank m ≥ 1, under general covariance structures. These results are proven to hold, in particular, for a family of non–separable covariance structures belonging to Gneiting class. For m = 2, under separability of the spatiotemporal covariance function in space and time, the properly normalized Minkowski functional, involving the modulus of a Gaussian random field, converges in distribution to the Rosenblatt type limiting distribution for a suitable range of the long memory parameter. For spatiotemporal isotropic stationary fields on sphere similar results obtained in Marinucci et al. [5]. Some other related results can be found in Makogin and Spodarev [4]. For short-memory random fields the asymptotic analysis of sojourn functionals can be done using the Mallivin-Stein technique, fourth-moment limit theorems, Breuer-Major type theorems (see [1,2,6,7,8] and the references therein).This is joint results with M.D.Ruiz-Medina (Granada University, Spain).
References:
[1] Bourguin, S.,Campese, S., Leonenko, N. and Taqqu, M.S. (2019) Four moments theorems on Markov chaos. Ann. Probab. 47 (2019), no. 3, 1417–1446
[2] Ivanov A.V., Leonenko N.N, Ruiz-Medina, M.D. and Savich, I.N. (2013) Limit theorems for weighted non-linear transformations of Gaussian processes with singular spectra, Ann. of Probab., vol. 41, No 2, 1088-1114
[3] Leonenko, N.N. and Ruiz-Medina, M.D. (2022) Sojourn functionals for spatiotemporal Gaussian random fields with long-memory, Journal of Applied Probability, in press.
[4] Makogin, V. and Spodarev, E. (2022). Limit theorems for excursion sets of subordinated Gaussian random fields with long-range dependence, Stochastics, 94, 111–142
[5] Marinucci, D., Rossi, M. and Vidotto, A. (2020). Non-universal fluctuations of the empirical measure for isotropic stationary fields on S2 × R, Annals of Applied Probability, 31, 2311–2349
[6 ] Nourdin, I. and Peccati, G. (2015) The optimal fourth moment theorem. Proc. Amer. Math. Soc. 143 (2015), no. 7, 3123–3133.
[7] Nourdin, I. and Peccati, G.and Podolskij, M. (2011) Quantitative Breuer-Major theorems. Stochastic Process. Appl. 121 (2011), no. 4, 793–812.
- Monday 05.09.2022, 4pm, MNO 1.050
Zhen-Qing Chen (University of Washington – Seattle)
Long range random walk on infinite groupsAbstract: Given the original discrete group and a random walk on it driven by a certain type of symmetric probability measure, there exists a homogeneous nilpotent Lie group which carries an adapted dilation structure and a stable-like process which appears in a Donsker-type functional limit theorem as the limit of a rescaled version of the random walk. Both the limit group and the limit process on that group depend on the driving probability measure. In addition to the functional limit theorem, a local limit theorem is also established.Based on joint work with Takashi Kumagai, Laurent Saloff-Coste, Jian Wang and Tianyi Zheng.