Title: Faster algorithms for automatic variable selection.


Statistical learning algorithms scan large databases to extract relevant information. When they involve a considerable number of variables, sparse models like Lasso or Support Vector Machines (SVM) are used to select the most critical variables in regression or classification problems. For example, the Lasso estimator depends solely on a subset of characteristics (called the equicorrelation set) while the SVM classifier depends only on a subset of samples (the support vectors). The other features/observations do not contribute to the optimal solution. Thus, rapid detection of these non-influent variables can lead to significant savings in memory and computational resources. In this presentation, I will discuss the (safe) screening rules that have recently been introduced as a technique for explicitly identifying parsimonious structures in optimization problems occurring in machine learning. This has led to efficient acceleration methods based on a substantial dimensionality reduction. For convex and separable functions, I will explain how these rules stem from a simple combination of a natural property of subdifferential sets and optimality conditions. Hence, I will present them in a unified way and with a complexity analysis describing the number of iterations needed to identify the optimal active set for any convergent algorithm. I will also elaborate on some future works and open challenges.


Eugene is a postdoctoral researcher at the RIKEN Center for Advanced Intelligence Project (AIP) in the data-driven biomedical science team led by Ichiro Takeuchi. He did a PhD in Applied Mathematics under the supervision of Olivier Fercoq and Joseph Salmon at Telecom ParisTech (Institut Polytechnique de Paris). His PhD thesis focused on the design and analysis of faster and safer optimization algorithms for variable selection and hyperparameter calibration. His current and forthcoming work focuses on the automatic and efficient construction of confidence regions as well as the analysis of implicit biases in the choice of optimization algorithms for machine learning.