You can find my publications also on Google Scholar.
Here is a list of recent publications:
S. Tamagnone, A. Laio, MG, "Coarse-Grained Molecular Dynamics with Normalizing Flows", Journal of Chemical Theory and Computation, 2024. |
Abstract: We propose a sampling algorithm relying on a collective variable (CV) of midsize dimension modeled by a normalizing flow and using nonequilibrium dynamics to propose full configurational moves from the proposition of a refreshed value of the CV made by the flow. The algorithm takes the form of a Markov chain with nonlocal updates, allowing jumps through energy barriers across metastable states. The flow is trained throughout the algorithm to reproduce the free energy landscape of the CV. The output of the algorithm is a sample of thermalized configurations and the trained network that can be used to efficiently produce more configurations. We show the functioning of the algorithm first in a test case with a mixture of Gaussians. Then, we successfully tested it on a higher-dimensional system consisting of a polymer in solution with a compact state and an extended stable state separated by a high free energy barrier. |
BibTeX:
@article{tamagnoneCoarseGrainedMolecularDynamics2024, author = { Tamagnone, Samuel and Laio, Alessandro and \myname}, title = {Coarse-Grained Molecular Dynamics with Normalizing Flows}, journal = { Journal of Chemical Theory and Computation}, publisher = { American Chemical Society}, year = {2024} } |
C. Schönle, MG, T. Lelièvre, G. Stoltz, "Sampling Metastable Systems Using Collective Variables and Jarzynski-Crooks Paths", N. arXiv:2405.18160, 2024. |
Abstract: We consider the problem of sampling a high dimensional multimodal target probability measure. We assume that a good proposal kernel to move only a subset of the degrees of freedoms (also known as collective variables) is known a priori. This proposal kernel can for example be built using normalizing flows. We show how to extend the move from the collective variable space to the full space and how to implement an accept-reject step in order to get a reversible chain with respect to a target probability measure. The accept-reject step does not require to know the marginal of the original measure in the collective variable (namely to know the free energy). The obtained algorithm admits several variants, some of them being very close to methods which have been proposed previously in the literature. We show how the obtained acceptance ratio can be expressed in terms of the work which appears in the Jarzynski-Crooks equality, at least for some variants. Numerical illustrations demonstrate the efficiency of the approach on various simple test cases, and allow us to compare the variants of the algorithm. |
BibTeX:
@misc{schonleSamplingMetastableSystems2024, author = { Schönle, Christoph and \myname and Lelièvre, Tony and Stoltz, Gabriel}, title = {Sampling Metastable Systems Using Collective Variables and Jarzynski-Crooks Paths}, number = { arXiv:2405.18160}, publisher = { arXiv}, year = {2024} } |
L. Grenioux, M. Noble, MG, A. O. Durmus, "Stochastic Localization via Iterative Posterior Sampling", Proceedings of the 41st International Conference on Machine Learning, pp 16337--16376, 2024. |
Abstract: Building upon score-based learning, new interest in stochastic localization techniques has recently emerged. In these models, one seeks to noise a sample from the data distribution through a stochastic process, called observation process, and progressively learns a denoiser associated to this dynamics. Apart from specific applications, the use of stochastic localization for the problem of sampling from an unnormalized target density has not been explored extensively. This work contributes to fill this gap. We consider a general stochastic localization framework and introduce an explicit class of observation processes, associated with flexible denoising schedules. We provide a complete methodology, Stochastic Localization via Iterative Posterior Sampling (SLIPS), to obtain approximate samples of these dynamics, and as a by-product, samples from the target distribution. Our scheme is based on a Markov chain Monte Carlo estimation of the denoiser and comes with detailed practical guidelines. We illustrate the benefits and applicability of SLIPS on several benchmarks of multi-modal distributions, including Gaussian mixtures in increasing dimensions, Bayesian logistic regression and a high-dimensional field system from statistical-mechanics. |
BibTeX:
@inproceedings{greniouxStochasticLocalizationIterative2024, author = { Grenioux, Louis and Noble, Maxence and \myname and Durmus, Alain Oliviero}, title = {Stochastic Localization via Iterative Posterior Sampling}, pages = { 16337--16376}, booktitle = { Proceedings of the 41st International Conference on Machine Learning}, publisher = { PMLR}, year = {2024} } |
A. Molina-Taborda, P. Cossio, O. Lopez-Acevedo, MG, "Active learning of Boltzmann samplers and potential energies with quantum mechanical accuracy", 2024. |
Abstract: Extracting consistent statistics between relevant free-energy minima of a molecular system is essential for physics, chemistry and biology. Molecular dynamics (MD) simulations can aid in this task but are computationally expensive, especially for systems that require quantum accuracy. To overcome this challenge, we develop an approach combining enhanced sampling with deep generative models and active learning of a machine learning potential (MLP). We introduce an adaptive Markov chain Monte Carlo framework that enables the training of one Normalizing Flow (NF) and one MLP per state. We simulate several Markov chains in parallel until they reach convergence, sampling the Boltzmann distribution with an efficient use of energy evaluations. At each iteration, we compute the energy of a subset of the NF-generated configurations using Density Functional Theory (DFT), we predict the remaining configuration's energy with the MLP and actively train the MLP using the DFT-computed energies. Leveraging the trained NF and MLP models, we can compute thermodynamic observables such as free-energy differences or optical spectra. We apply this method to study the isomerization of an ultrasmall silver nanocluster, belonging to a set of systems with diverse applications in the fields of medicine and catalysis. |
BibTeX:
@misc{molina-tabordaActiveLearningBoltzmann2024, author = { Molina-Taborda, Ana and Cossio, Pilar and Lopez-Acevedo, Olga and \myname}, title = {Active learning of Boltzmann samplers and potential energies with quantum mechanical accuracy}, publisher = { arXiv}, year = {2024} } |
L. Grenioux, E. Moulines, MG, "Balanced Training of Energy-Based Models with Adaptive Flow Sampling", Journal of Open Source Software ICML 2023 Workshop on Structured Probabilistic Inference \textbackslash& Generative Modeling, Vol. 8, N. 83, pp 5021, 2023. |
Abstract: Energy-based models (EBMs) are versatile density estimation models that directly parameterize an unnormalized log density. Although very flexible, EBMs lack a specified normalization constant of the model, making the likelihood of the model computationally intractable. Several approximate samplers and variational inference techniques have been proposed to estimate the likelihood gradients for training. These techniques have shown promising results in generating samples, but little attention has been paid to the statistical accuracy of the estimated density, such as determining the relative importance of different classes in a dataset. In this work, we propose a new maximum likelihood training algorithm for EBMs that uses a different type of generative model, normalizing flows (NF), which have recently been proposed to facilitate sampling. Our method fits an NF to an EBM during training so that an NF-assisted sampling scheme provides an accurate gradient for the EBMs at all times, ultimately leading to a fast sampler for generating new data. |
BibTeX:
@article{Wong2023, author = { Grenioux, Louis and Moulines, Eric and \myname}, title = {Balanced Training of Energy-Based Models with Adaptive Flow Sampling}, journal = { Journal of Open Source Software}, volume = { 8}, number = { 83}, pages = { 5021}, booktitle = { ICML 2023 Workshop on Structured Probabilistic Inference \textbackslash& Generative Modeling}, year = {2023} } |
L. Grenioux, A. O. Durmus, E. Moulines, MG, "On Sampling with Approximate Transport Maps", Proceedings of the 40th International Conference on Machine Learning, pp 11698--11733, 2023. |
Abstract: Transport maps can ease the sampling of distributions with non-trivial geometries by transforming them into distributions that are easier to handle. The potential of this approach has risen with the development of Normalizing Flows (NF) which are maps parameterized with deep neural networks trained to push a reference distribution towards a target. NF-enhanced samplers recently proposed blend (Markov chain) Monte Carlo methods with either (i) proposal draws from the flow or (ii) a flow-based reparametrization. In both cases, the quality of the learned transport conditions performance. The present work clarifies for the first time the relative strengths and weaknesses of these two approaches. Our study concludes that multimodal targets can be reliably handled with flow-based proposals up to moderately high dimensions. In contrast, methods relying on reparametrization struggle with multimodality but are more robust otherwise in high-dimensional settings and under poor training. To further illustrate the influence of target-proposal adequacy, we also derive a new quantitative bound for the mixing time of the Independent Metropolis-Hastings sampler. |
BibTeX:
@inproceedings{grenioux_sampling_2023, author = { Grenioux, Louis and Durmus, Alain Oliviero and Moulines, Eric and \myname}, title = {On Sampling with Approximate Transport Maps}, pages = { 11698--11733}, booktitle = { Proceedings of the 40th International Conference on Machine Learning}, publisher = { PMLR}, year = {2023} } |
C. Schönle, MG, "Optimizing Markov Chain Monte Carlo Convergence with Normalizing Flows and Gibbs Sampling", NeurIPS 2023 AI for Science Workshop, 2023. |
Abstract: Generative models have started to integrate into the scientific computing toolkit. One notable instance of this integration is the utilization of normalizing flows (NF) in the development of sampling and variational inference algorithms. This work introduces a novel algorithm, GflowMC, which relies on a Metropolis-within-Gibbs framework within the latent space of NFs. This approach addresses the challenge of vanishing acceptance probabilities often encountered when using NF-generated independent proposals, while retaining non-local updates, enhancing its suitability for sampling multi-modal distributions. We assess GflowMC's performance concentrating on the \$\textbackslashphi\textasciicircum4\$ model from statistical mechanics. Our results demonstrate that by identifying an optimal size for partial updates, convergence of the Markov Chain Monte Carlo (MCMC) can be achieved faster than with full updates. Additionally, we explore the adaptability of GflowMC for biasing proposals towards increasing the update frequency of critical coordinates, such as coordinates highly correlated to mode switching in multi-modal targets. |
BibTeX:
@inproceedings{schonleOptimizingMarkovChain2023, author = { Schönle, Christoph and \myname}, title = {Optimizing Markov Chain Monte Carlo Convergence with Normalizing Flows and Gibbs Sampling}, booktitle = { NeurIPS 2023 AI for Science Workshop}, year = {2023} } |
MG, S. Ganguli, C. Lucibello, R. Zecchina, "Neural Networks: From the Perceptron to Deep Nets", Spin Glass Theory and Far Beyond, pp 477--497, 2023. |
Abstract: |
BibTeX:
@inbook{gabrieNeuralNetworksPerceptron2023, author = { \myname and Ganguli, Surya and Lucibello, Carlo and Zecchina, Riccardo}, title = {Neural Networks: From the Perceptron to Deep Nets}, pages = { 477--497}, booktitle = { Spin Glass Theory and Far Beyond}, publisher = { WORLD SCIENTIFIC}, year = {2023} } |
S. Samsonov, E. Lagutin, MG, A. Durmus, A. Naumov, E. Moulines, "Local-Global MCMC kernels: the best of both worlds", Advances in Neural Information Processing Systems, 2022. |
Abstract: |
BibTeX:
@inproceedings{samsonov2022localglobal, author = {Sergey Samsonov and Evgeny Lagutin and \myname and Alain Durmus and Alexey Naumov and Eric Moulines}, title = {Local-Global MCMC kernels: the best of both worlds}, booktitle = {Advances in Neural Information Processing Systems}, year = {2022} } |
MG, G. M. Rotskoff, E. Vanden-Eijnden, "Adaptive Monte Carlo augmented with normalizing flows", Proceedings of the National Academy of Sciences, Vol. 119, N. 10, 2022. |
Abstract: Monte Carlo methods, tools for sampling data from probability distributions, are widely used in the physical sciences, applied mathematics, and Bayesian statistics. Nevertheless, there are many situations in which it is computationally prohibitive to use Monte Carlo due to slow “mixing” between modes of a distribution unless hand-tuned algorithms are used to accelerate the scheme. Machine learning techniques based on generative models offer a compelling alternative to the challenge of designing efficient schemes for a specific system. Here, we formalize Monte Carlo augmented with normalizing flows and show that, with limited prior data and a physically inspired algorithm, we can substantially accelerate sampling with generative models. |
BibTeX:
@article{Gabrie2021, author = { \myname and Rotskoff, Grant M. and Vanden-Eijnden, Eric}, title = {Adaptive Monte Carlo augmented with normalizing flows}, journal = { Proceedings of the National Academy of Sciences}, volume = { 119}, number = { 10}, year = {2022} } |
J. Brofos, MG, M. A. Brubaker, R. R. Lederman, "Adaptation of the Independent Metropolis-Hastings Sampler with Normalizing Flow Proposals", Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, pp 5949--5986, 2022. |
Abstract: Markov Chain Monte Carlo (MCMC) methods are a powerful tool for computation with complex probability distributions. However the performance of such methods is critically dependent on properly tuned parameters, most of which are difficult if not impossible to know a priori for a given target distribution. Adaptive MCMC methods aim to address this by allowing the parameters to be updated during sampling based on previous samples from the chain at the expense of requiring a new theoretical analysis to ensure convergence. In this work we extend the convergence theory of adaptive MCMC methods to a new class of methods built on a powerful class of parametric density estimators known as normalizing flows. In particular, we consider an independent Metropolis-Hastings sampler where the proposal distribution is represented by a normalizing flow whose parameters are updated using stochastic gradient descent. We explore the practical performance of this procedure on both synthetic settings and in the analysis of a physical field system, and compare it against both adaptive and non-adaptive MCMC methods. |
BibTeX:
@inproceedings{brofosAdaptationIndependentMetropolisHastings2022, author = { Brofos, James and \myname and Brubaker, Marcus A. and Lederman, Roy R.}, title = {Adaptation of the Independent Metropolis-Hastings Sampler with Normalizing Flow Proposals}, pages = { 5949--5986}, booktitle = { Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, publisher = { PMLR}, year = {2022} } |
A. Dawid, J. Arnold, B. Requena, A. Gresch, M. P\textbackslashlodzie\'n, K. Donatella, K. A. Nicoli, P. Stornati, R. Koch, M. Büttner, "Modern Applications of Machine Learning in Quantum Sciences", arXiv preprint arXiv:2204.04198, 2022. |
Abstract: |
BibTeX:
@article{dawidModernApplicationsMachine2022, author = { Dawid, Anna and Arnold, Julian and Requena, Borja and Gresch, Alexander and P\textbackslashlodzie\'n, Marcin and Donatella, Kaelan and Nicoli, Kim A. and Stornati, Paolo and Koch, Rouven and Büttner, Miriam}, title = {Modern Applications of Machine Learning in Quantum Sciences}, journal = { arXiv preprint arXiv:2204.04198}, year = {2022} } |
H. Lawrence, D. Barmherzig, H. Li, M. Eickenberg, MG, "Phase Retrieval with Holography and Untrained Priors: Tackling the Challenges of Low-Photon Nanoscale Imaging", Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, pp 516--567, 2022. |
Abstract: Phase retrieval is the inverse problem of recovering a signal from magnitude-only Fourier measure- ments, and underlies numerous imaging modalities, such as Coherent Diffraction Imaging (CDI). A variant of this setup, known as holography, includes a reference object that is placed adjacent to the specimen of interest before measurements are collected. The resulting inverse problem, known as holographic phase retrieval, is well-known to have improved problem conditioning relative to the original. This innovation, i.e. Holographic CDI, becomes crucial at the nanoscale, where imaging specimens such as viruses, proteins, and crystals require low-photon measurements. This data is highly corrupted by Poisson shot noise, and often lacks low-frequency content as well. In this work, we introduce a dataset-free deep learning framework for holographic phase retrieval adapted to these challenges. The key ingredients of our approach are the explicit and flexible incorporation of the physical forward model into an automatic differentiation procedure, the Poisson log-likelihood objective function, and an optional untrained deep image prior. We perform extensive evaluation under realistic conditions. Compared to competing classical methods, our method recovers signal from higher noise levels and is more resilient to suboptimal reference design, as well as to large missing regions of low frequencies in the observations. Finally, we show that these properties carry over to experimental data acquired on optical wavelengths. To the best of our knowledge, this is the first work to consider a dataset-free machine learning approach for holographic phase retrieval. |
BibTeX:
@inproceedings{lawrencePhaseRetrievalHolography2022, author = { Lawrence, Hannah and Barmherzig, David and Li, Henry and Eickenberg, Michael and \myname}, title = {Phase Retrieval with Holography and Untrained Priors: Tackling the Challenges of Low-Photon Nanoscale Imaging}, pages = { 516--567}, booktitle = { Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference}, publisher = { PMLR}, year = {2022} } |
C. Domingo-Enrich, A. Bietti, MG, J. Bruna, E. Vanden-Eijnden, "Dual Training of Energy-Based Models with Overparametrized Shallow Neural Networks", N. arXiv:2107.05134, 2021. |
Abstract: Energy-based models (EBMs) are generative models that are usually trained via maximum likelihood estimation. This approach becomes challenging in generic situations where the trained energy is nonconvex, due to the need to sample the Gibbs distribution associated with this energy. Using general Fenchel duality results, we derive variational principles dual to maximum likelihood EBMs with shallow overparametrized neural network energies, both in the active (aka feature-learning) and lazy regimes. In the active regime, this dual formulation leads to a training algorithm in which one updates concurrently the particles in the sample space and the neurons in the parameter space of the energy. We also consider a variant of this algorithm in which the particles are sometimes restarted at random samples drawn from the data set, and show that performing these restarts at every iteration step corresponds to score matching training. Using intermediate parameter setups in our dual algorithm thereby gives a way to interpolate between maximum likelihood and score matching training. These results are illustrated in simple numerical experiments. |
BibTeX:
@misc{domingo-enrichDualTrainingEnergyBased2022, author = { Domingo-Enrich, Carles and Bietti, Alberto and \myname and Bruna, Joan and Vanden-Eijnden, Eric}, title = {Dual Training of Energy-Based Models with Overparametrized Shallow Neural Networks}, number = { arXiv:2107.05134}, publisher = { arXiv}, year = {2021} } |
MG, G. M. Rotskoff, E. Vanden-Eijnden, "Efficient Bayesian Sampling Using Normalizing Flows to Assist Markov Chain Monte Carlo Methods", Invertible Neural Networks, NormalizingFlows, and Explicit Likelihood Models (ICML Workshop)., 2021. |
Abstract: |
BibTeX:
@inproceedings{Gabrie2021a, author = { \myname and Rotskoff, Grant M. and Vanden-Eijnden, Eric}, title = {Efficient Bayesian Sampling Using Normalizing Flows to Assist Markov Chain Monte Carlo Methods}, booktitle = { Invertible Neural Networks, NormalizingFlows, and Explicit Likelihood Models (ICML Workshop).}, year = {2021} } |
H. Lawrence, D. A. Barmherzig, M. Eickenberg, MG, "Low-Photon Holographic Phase Retrieval via a Deep Decoder Neural Network", OSA Optical Sensors and Sensing Congress 2021 (AIS, FTS, HISE, SENSORS, ES) OSA Optical Sensors and Sensing Congress 2021 (AIS, FTS, HISE, SENSORS, ES), pp JTu5A.19, 2021. |
Abstract: A deep decoder neural network is applied towards the enhancement of recent algorithms for holographic Coherent Diffraction Imaging (CDI). This method does not re- quire training data, and provides improved imaging given noisy low-photon CDI data. |
BibTeX:
@inproceedings{Lawrence:21, author = { Hannah Lawrence and David A. Barmherzig and Michael Eickenberg and \myname}, title = {Low-Photon Holographic Phase Retrieval via a Deep Decoder Neural Network}, journal = { OSA Optical Sensors and Sensing Congress 2021 (AIS, FTS, HISE, SENSORS, ES)}, pages = { JTu5A.19}, booktitle = { OSA Optical Sensors and Sensing Congress 2021 (AIS, FTS, HISE, SENSORS, ES)}, publisher = { Optica Publishing Group}, year = {2021} } |
S. d' Ascoli, MG, L. Sagun, G. Biroli, "On the Interplay between Data Structure and Loss Function in Classification Problems", Advances in Neural Information Processing Systems, Vol. 34, pp 8506--8517, 2021. |
Abstract: One of the central features of modern machine learning models, including deep neural networks, is their generalization ability on structured data in the over-parametrized regime. In this work, we consider an analytically solvable setup to investigate how properties of data impact learning in classification problems, and compare the results obtained for quadratic loss and logistic loss. Using methods from statistical physics, we obtain a precise asymptotic expression for the train and test errors of random feature models trained on a simple model of structured data. The input covariance is built from independent blocks allowing us to tune the saliency of low-dimensional structures and their alignment with respect to the target function.Our results show in particular that in the over-parametrized regime, the impact of data structure on both train and test error curves is greater for logistic loss than for mean-squared loss: the easier the task, the wider the gap in performance between the two losses at the advantage of the logistic. Numerical experiments on MNIST and CIFAR10 confirm our insights. |
BibTeX:
@inproceedings{dascoliInterplayDataStructure2021a, author = { d' Ascoli, Stéphane and \myname and Sagun, Levent and Biroli, Giulio}, title = {On the Interplay between Data Structure and Loss Function in Classification Problems}, volume = { 34}, pages = { 8506--8517}, booktitle = { Advances in Neural Information Processing Systems}, publisher = { Curran Associates, Inc.}, year = {2021} } |
MG, J. Barbier, F. Krzakala, L. Zdeborova, "Blind Calibration for Compressed Sensing: State Evolution and an Online Algorithm", Journal of Physics A: Mathematical and Theoretical, Vol. 53, N. 33, pp 334004, 2020. |
Abstract: Compressed sensing allows for the acquisition of compressible signals with a small number of measurements. In experimental settings, the sensing process corresponding to the hardware implementation is not always perfectly known and may require a calibration. To this end, blind calibration proposes to perform at the same time the calibration and the compressed sensing. Schülke and collaborators suggested an approach based on approximate message passing for blind calibration (cal-AMP) in (Schülke C et al 2013 Advances in Neural Information Processing Systems 26 1--9 and Schülke C et al 2015 J. Stat. Mech. P11013). Here, their algorithm is extended from the already proposed offline case to the online case, for which the calibration is refined step by step as new measured samples are received. We show that the performance of both the offline and the online algorithms can be theoretically studied via the state evolution formalism. Finally, the efficiency of cal-AMP and the consistency of the theoretical predictions are confirmed through numerical simulations. |
BibTeX:
@article{gabrieBlindCalibrationCompressed2020, author = { \myname and Barbier, Jean and Krzakala, Florent and Zdeborova, Lenka}, title = {Blind Calibration for Compressed Sensing: State Evolution and an Online Algorithm}, journal = { Journal of Physics A: Mathematical and Theoretical}, volume = { 53}, number = { 33}, pages = { 334004}, publisher = { IOP Publishing}, year = {2020} } |
MG, "Mean-Field Inference Methods for Neural Networks", Journal of Physics A: Mathematical and Theoretical, Vol. 53, N. 22, pp 223002, 2020. |
Abstract: Machine learning algorithms relying on deep neural networks recently allowed a great leap forward in artificial intelligence. Despite the popularity of their applications, the efficiency of these algorithms remains largely unexplained from a theoretical point of view. The mathematical description of learning problems involves very large collections of interacting random variables, difficult to handle analytically as well as numerically. This complexity is precisely the object of study of statistical physics. Its mission, originally pointed towards natural systems, is to understand how macroscopic behaviors arise from microscopic laws. Mean-field methods are one type of approximation strategy developped in this view. We review a selection of classical mean-field methods and recent progress relevant for inference in neural networks. In particular, we remind the principles of derivations of high-temperature expansions, the replica method and message passing algorithms, highligthing their equivalences and complementarities. We also provide references for past and current directions of research on neural networks relying on mean-field methods. |
BibTeX:
@article{gabrieMeanfieldInferenceMethods2020, author = { \myname}, title = {Mean-Field Inference Methods for Neural Networks}, journal = { Journal of Physics A: Mathematical and Theoretical}, volume = { 53}, number = { 22}, pages = { 223002}, year = {2020} } |
MG, "Towards an Understanding of Neural Networks : Mean-Field Incursions", Université Paris sciences et lettres, 2019. |
Abstract: Machine learning algorithms relying on deep new networks recently allowed a great leap forward in artificial intelligence. Despite the popularity of their applications, the efficiency of these algorithms remains largely unexplained from a theoretical point of view. The mathematical descriptions of learning problems involves very large collections of interacting random variables, difficult to handle analytically as well as numerically. This complexity is precisely the object of study of statistical physics. Its mission, originally pointed towards natural systems, is to understand how macroscopic behaviors arise from microscopic laws. In this thesis we propose to take advantage of the recent progress in mean-field methods from statistical physics to derive relevant approximations in this context. We exploit the equivalences and complementarities of message passing algorithms, high-temperature expansions and the replica method. Following this strategy we make practical contributions for the unsupervised learning of Boltzmann machines. We also make theoretical contributions considering the teacher-student paradigm to model supervised learning problems. We develop a framework to characterize the evolution of information during training in these model. Additionally, we propose a research direction to generalize the analysis of Bayesian learning in shallow neural networks to their deep counterparts. |
BibTeX:
@phdthesis{gabrieUnderstandingNeuralNetworks2019, author = { \myname}, title = {Towards an Understanding of Neural Networks : Mean-Field Incursions}, school = { Université Paris sciences et lettres}, year = {2019} } |
MG, J. Barbier, F. Krzakala, L. Zdeborová, "Blind Calibration for Sparse Regression: A State Evolution Analysis", 2019 IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp 649-653, 2019. |
Abstract: |
BibTeX:
@inproceedings{9022479, author = {\myname and Barbier, Jean and Krzakala, Florent and Zdeborová, Lenka}, title = {Blind Calibration for Sparse Regression: A State Evolution Analysis}, pages = {649-653}, booktitle = {2019 IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)}, year = {2019} } |
MG, A. Manoel, C. Luneau, j. barbier, N. Macris, F. Krzakala, L. Zdeborova, "Entropy and Mutual Information in Models of Deep Neural Networks", Advances in Neural Information Processing Systems, Vol. 31, 2018. |
Abstract: |
BibTeX:
@inproceedings{gabrieEntropyMutualInformation2018a, author = { \myname and Manoel, Andre and Luneau, Clément and barbier, jean and Macris, Nicolas and Krzakala, Florent and Zdeborova, Lenka}, title = {Entropy and Mutual Information in Models of Deep Neural Networks}, volume = { 31}, booktitle = { Advances in Neural Information Processing Systems}, publisher = { Curran Associates, Inc.}, year = {2018} } |
E. W. Tramel, M. Gabrié, A. Manoel, F. Caltagirone, F. Krzakala, "Deterministic and Generalized Framework for Unsupervised Learning with Restricted Boltzmann Machines", Physical Review X, Vol. 8, N. 4, pp 041006, 2018. |
Abstract: We study in this paper the structure of solutions in the random hypergraph coloring problem and the phase transitions they undergo when the density of constraints is varied. Hypergraph coloring is a constraint satisfaction problem where each constraint includes \$K\$ variables that must be assigned one out of \$q\$ colors in such a way that there are no monochromatic constraints, i.e. there are at least two distinct colors in the set of variables belonging to every constraint. This problem generalizes naturally coloring of random graphs (\$K=2\$) and bicoloring of random hypergraphs (\$q=2\$), both of which were extensively studied in past works. The study of random hypergraph coloring gives us access to a case where both the size \$q\$ of the domain of the variables and the arity \$K\$ of the constraints can be varied at will. Our work provides explicit values and predictions for a number of phase transitions that were discovered in other constraint satisfaction problems but never evaluated before in hypergraph coloring. Among other cases we revisit the hypergraph bicoloring problem (\$q=2\$) where we find that for \$K=3\$ and \$K=4\$ the colorability threshold is not given by the one-step-replica-symmetry-breaking analysis as the latter is unstable towards more levels of replica symmetry breaking. We also unveil and discuss the coexistence of two different 1RSB solutions in the case of \$q=2\$, \$K \textbackslashge 4\$. Finally we present asymptotic expansions for the density of constraints at which various phase transitions occur, in the limit where \$q\$ and/or \$K\$ diverge. |
BibTeX:
@article{gabriePhaseTransitionsColoring2017a, author = { Tramel, Eric W. and Gabrié, Marylou and Manoel, Andre and Caltagirone, Francesco and Krzakala, Florent}, title = {Deterministic and Generalized Framework for Unsupervised Learning with Restricted Boltzmann Machines}, journal = { Physical Review X}, volume = { 8}, number = { 4}, pages = { 041006}, year = {2018} } |