Images : Alexandre Mellier 2008 + Borgy photographie
|
|
Cette édition aura le plaisir d'accueillir 3 conférenciers invités : |
||
Directeur de Recherche INRIA, LIP ENS-Lyon |
Research Scientist - Google Brain, Professeur CREST - ENSAE, Institut Polytechnique de Paris |
Professeure Télécom Paris, LTCI, Institut Polytechnique de Paris |
Promoting sparse connections in neural networks is natural to control their computational complexity, possibly at the cost of a reduced expressivity. Given its documented role for inverse problems and variable selection, one can also expect that sparsity can help design more interpretable network learning mechanisms. The talk will present recent explorations around this theme. I will first highlight the role of an invariant path-embedding of the parameters of a network, both to learn the parameters and to analyze their identifiability from the function implemented by the network. Then, I will describe an unexpected decoupling between the absence of spurious local valleys/minima in the optimization landscape of shallow sparse linear network learning and the tractability of the problem. In the process, we identify an algorithm bringing speedups up to two orders of magnitude when learning certain fast transforms via multilayer sparse factorization. |
Computing or approximating an optimal transport cost is rarely the sole goal when using OT in applications. In most cases the end goal relies instead on solving the OT problem and studying the differentiable properties of its solutions w.r.t. to arbitrary inputs, be them points clouds in Euclidean spaces or points on a graph. I will present in this talk recent applications that highlight this necessity, as well as concrete algorithmic and programmatic solutions to handle such issues which have been implemented in Optimal Transport Tools (OTT), a python toolbox using JAX for differentiable programming. |
Motivated by real-world complex output prediction tasks such as link prediction, molecule identification or functional output regression, we propose to leverage the notion of output kernel to take into account the nature of output variables whether they be discrte structures or functions. This approach boils down to encode output data as vectors of the Reproducing kernel Hilbert Space associated to the so-called output kernel. We exhibit a large family of predictive models that range from output kernel trees to vector-valued kernel models as well as deep kernel models that are suitable as hypothesis spaces. We also highlight different losses and their necessary properties that allow to implement these approaches in practise and present learning algorithms devoted to those tasks. Interestingly, the benefits of having a functional output overcomes the case of observing functional outputs. We present infinite task learning, an extension of multi-task learning to an infimum of a tasks, exemplified by quantile regression, cost-sensitive classification or one-class SVM. We show how impose specific constraints on the shape of the predicted function over the hyperparameter space, enlarging the regularization tools. Eventually, larger scale problems like style transfer can be tackled by vectorial ITL, with an emphasis on emotion transfer for facial landmarks. |
Session MALIA (SFdS) - statistique et optimisation : |
||
Maître de Conférences Laboratoire Jean Kuntzmann (LJK), Université Grenoble Alpes, DAO Team |
Post-doctoral researcher (MIT, CSAIL, USA) |
|
Many problems in data science (regression, classification, clustering, etc.) lead to the minimization of some risk function that measures the adequation between a model and the data. However, when the number of parameters of the model becomes large and the difficulty of the problem increases, the risk minimization gets harder and the stability of the obtained model is degraded. In order to overcome this issue, a popular solution is to introduce a prior on the structure of the model. For instance, we may want the obtained model to be sparse or low-rank so that the number of (non-null) parameters of the model is not too large. In this talk, we study how certain optimization methods can produce iterates that recover partially or exactly this structure. Then, we show how this information can be harnessed to numerically accelerate these algorithms. |
We will see how optimal transport can be seen as a stochastic optimization problem, which allows to leverage stochastic gradient methods to solve related problems. Aside from an efficient online estimation of optimal transport distances, these methods can also be extended to compute large-scale Wasserstein barycenters, a notion of average between probability measures. |
Session CRITEO - Privacy and Machine Learning : |
Ecole polytechnique fédérale de Lausanne, Suisse |
We address the problem of privacy-preserving training and evaluation of neural networks in an N-party, federated learning setting. We propose a novel system, POSEIDON, that employs multiparty lattice-based cryptography and preserves the confidentiality of the training data, the model, and the evaluation data, under a passive-adversary model and collusions between up to N − 1 parties. To efficiently execute the secure backpropagation algorithm for training neural networks, we provide a generic packing approach that enables Single Instruction, Multiple Data (SIMD) operations on encrypted data. We also introduce arbitrary linear transfor- mations within the cryptographic bootstrapping operation, optimizing the costly cryptographic computations over the parties, and we define a constrained optimiza- tion problem for choosing the cryptographic parameters. Our experimental results show that POSEIDON achieves accuracy similar to centralized or decentralized non- private approaches and that its computation and communication overhead scales linearly with the number of parties. POSEIDON trains a 3-layer neural network on the MNIST dataset with 784 features and 60K samples distributed among 10 parties in less than 2 hours. |
Restitution du challenge - mardi 15 à 16h45 |
Session AFIA - méthodes hybrides : |
|
Associate professor, KU Leuven, Belgium |
Professor, University of Vienna, Austria, Research Group Neuroinformatics |
Increasingly, the predictions made by a machine learning system are used in the process of solving real-live decision making problems, such as scheduling, allocation and combinatorial optimisation in general. A predict-then-optimize approach is a widely-utilized approach, where first a machine learning (ML) model is trained to make point predictions (e.g. demand, load, value) and then the optimization problem is solved using the predictions. A more appropriate choice would be to integrate the prediction and the optimization task and train the ML model using a decision-focused loss. This hybrid learning and reasoning approach combines both the best of (gradient-descent) learning constraint-based reasoning through combinatorial optimisation. Such a predict-and-optimize approach is proven to be effective in various tasks. However computational complexity and scalability are two major roadblocks for the hybrid predict-and-optimize approach. This is due to the fact that an NP-hard optimization problem must be solved and differentiated for each training instance on each training epoch to find a gradient of the optimization task for backpropgating it during model training. In this talk, we review a range of recent approaches to address these roadblocks including decision-focussed learning for black-box and white-box systems, as well as approximations and other techniques that lead to increasing feasibility of this promising approach. |
Machine learning models are increasingly deployed in high-stakes environments, e.g., in medical decision support or loan approval, where ethical and legal considerations require models to be interpretable. I briefly review challenges in designing interpretable machine learning (IML) methods and then argue that we can resolve some of these issues by adopting a causal perspective on the problem. In particular, I argue that we need to distinguish between interpreting the model and using the model to interpret the data-generating process and demonstrate how a causal perspective can disentangle these two questions. |
Intervenants de la table ronde "Machine Learning responsable : que peut-on faire à notre niveau ?" (Mercredi 16 à 14h30) |
|||||
Professeure Télécom Paris, LTCI, Institut Polytechnique de Paris |
Research Scientist - Google Brain, Professeur CREST - ENSAE, Institut Polytechnique de Paris |
Professeure Université de Rennes I, IUF |
Directeur de Recherche INRIA, LIP ENS-Lyon |
Directeur de la recherche au Criteo AI Lab |
Professeur Université de Lille, INRIA Lille |
Personnes connectées : 8 | Vie privée |