Images : Alexandre Mellier 2008 + Borgy photographie


Cette édition aura le plaisir d'accueillir 3 conférenciers invités : 

Directeur de Recherche INRIA, LIP ENSLyon 
Research Scientist  Google Brain, Professeur CREST  ENSAE, Institut Polytechnique de Paris 
Professeure Télécom Paris, LTCI, Institut Polytechnique de Paris 
Promoting sparse connections in neural networks is natural to control their computational complexity, possibly at the cost of a reduced expressivity. Given its documented role for inverse problems and variable selection, one can also expect that sparsity can help design more interpretable network learning mechanisms. The talk will present recent explorations around this theme. I will first highlight the role of an invariant pathembedding of the parameters of a network, both to learn the parameters and to analyze their identifiability from the function implemented by the network. Then, I will describe an unexpected decoupling between the absence of spurious local valleys/minima in the optimization landscape of shallow sparse linear network learning and the tractability of the problem. In the process, we identify an algorithm bringing speedups up to two orders of magnitude when learning certain fast transforms via multilayer sparse factorization. 
Computing or approximating an optimal transport cost is rarely the sole goal when using OT in applications. In most cases the end goal relies instead on solving the OT problem and studying the differentiable properties of its solutions w.r.t. to arbitrary inputs, be them points clouds in Euclidean spaces or points on a graph. I will present in this talk recent applications that highlight this necessity, as well as concrete algorithmic and programmatic solutions to handle such issues which have been implemented in Optimal Transport Tools (OTT), a python toolbox using JAX for differentiable programming. 
Motivated by realworld complex output prediction tasks such as link prediction, molecule identification or functional output regression, we propose to leverage the notion of output kernel to take into account the nature of output variables whether they be discrte structures or functions. This approach boils down to encode output data as vectors of the Reproducing kernel Hilbert Space associated to the socalled output kernel. We exhibit a large family of predictive models that range from output kernel trees to vectorvalued kernel models as well as deep kernel models that are suitable as hypothesis spaces. We also highlight different losses and their necessary properties that allow to implement these approaches in practise and present learning algorithms devoted to those tasks. Interestingly, the benefits of having a functional output overcomes the case of observing functional outputs. We present infinite task learning, an extension of multitask learning to an infimum of a tasks, exemplified by quantile regression, costsensitive classification or oneclass SVM. We show how impose specific constraints on the shape of the predicted function over the hyperparameter space, enlarging the regularization tools. Eventually, larger scale problems like style transfer can be tackled by vectorial ITL, with an emphasis on emotion transfer for facial landmarks. 
Session MALIA (SFdS)  statistique et optimisation : 

Maître de Conférences Laboratoire Jean Kuntzmann (LJK), Université Grenoble Alpes, DAO Team 
Postdoctoral researcher (MIT, CSAIL, USA) 

Many problems in data science (regression, classification, clustering, etc.) lead to the minimization of some risk function that measures the adequation between a model and the data. However, when the number of parameters of the model becomes large and the difficulty of the problem increases, the risk minimization gets harder and the stability of the obtained model is degraded. In order to overcome this issue, a popular solution is to introduce a prior on the structure of the model. For instance, we may want the obtained model to be sparse or lowrank so that the number of (nonnull) parameters of the model is not too large. In this talk, we study how certain optimization methods can produce iterates that recover partially or exactly this structure. Then, we show how this information can be harnessed to numerically accelerate these algorithms. 
We will see how optimal transport can be seen as a stochastic optimization problem, which allows to leverage stochastic gradient methods to solve related problems. Aside from an efficient online estimation of optimal transport distances, these methods can also be extended to compute largescale Wasserstein barycenters, a notion of average between probability measures. 
Session CRITEO  Privacy and Machine Learning : 
Ecole polytechnique fédérale de Lausanne, Suisse 
We address the problem of privacypreserving training and evaluation of neural networks in an Nparty, federated learning setting. We propose a novel system, POSEIDON, that employs multiparty latticebased cryptography and preserves the confidentiality of the training data, the model, and the evaluation data, under a passiveadversary model and collusions between up to N − 1 parties. To efficiently execute the secure backpropagation algorithm for training neural networks, we provide a generic packing approach that enables Single Instruction, Multiple Data (SIMD) operations on encrypted data. We also introduce arbitrary linear transfor mations within the cryptographic bootstrapping operation, optimizing the costly cryptographic computations over the parties, and we define a constrained optimiza tion problem for choosing the cryptographic parameters. Our experimental results show that POSEIDON achieves accuracy similar to centralized or decentralized non private approaches and that its computation and communication overhead scales linearly with the number of parties. POSEIDON trains a 3layer neural network on the MNIST dataset with 784 features and 60K samples distributed among 10 parties in less than 2 hours. 
Restitution du challenge  mardi 15 à 16h45 
Session AFIA  méthodes hybrides : 

Associate professor, KU Leuven, Belgium 
Professor, University of Vienna, Austria, Research Group Neuroinformatics 
Increasingly, the predictions made by a machine learning system are used in the process of solving reallive decision making problems, such as scheduling, allocation and combinatorial optimisation in general. A predictthenoptimize approach is a widelyutilized approach, where first a machine learning (ML) model is trained to make point predictions (e.g. demand, load, value) and then the optimization problem is solved using the predictions. A more appropriate choice would be to integrate the prediction and the optimization task and train the ML model using a decisionfocused loss. This hybrid learning and reasoning approach combines both the best of (gradientdescent) learning constraintbased reasoning through combinatorial optimisation. Such a predictandoptimize approach is proven to be effective in various tasks. However computational complexity and scalability are two major roadblocks for the hybrid predictandoptimize approach. This is due to the fact that an NPhard optimization problem must be solved and differentiated for each training instance on each training epoch to find a gradient of the optimization task for backpropgating it during model training. In this talk, we review a range of recent approaches to address these roadblocks including decisionfocussed learning for blackbox and whitebox systems, as well as approximations and other techniques that lead to increasing feasibility of this promising approach. 
Machine learning models are increasingly deployed in highstakes environments, e.g., in medical decision support or loan approval, where ethical and legal considerations require models to be interpretable. I briefly review challenges in designing interpretable machine learning (IML) methods and then argue that we can resolve some of these issues by adopting a causal perspective on the problem. In particular, I argue that we need to distinguish between interpreting the model and using the model to interpret the datagenerating process and demonstrate how a causal perspective can disentangle these two questions. 
Intervenants de la table ronde "Machine Learning responsable : que peuton faire à notre niveau ?" (Mercredi 16 à 14h30) 

Professeure Télécom Paris, LTCI, Institut Polytechnique de Paris 
Research Scientist  Google Brain, Professeur CREST  ENSAE, Institut Polytechnique de Paris 
Professeure Université de Rennes I, IUF 
Directeur de Recherche INRIA, LIP ENSLyon 
Directeur de la recherche au Criteo AI Lab 
Professeur Université de Lille, INRIA Lille 
Personnes connectées : 1  Vie privée 