Publications
Preprints and publications are presented in reverse chronological order.
2024
- ICML ESMLearning Optimal Filters Using Variational InferenceEnoch Luk, Eviatar Bach, Ricardo Baptista, and Andrew StuartIn ICML Machine Learning for Earth System Modeling Workshop, 2024
Filtering-the task of estimating the conditional distribution of states of a dynamical system given partial, noisy, observations-is important in many areas of science and engineering, including weather and climate prediction. However, the filtering distribution is generally intractable to obtain for high-dimensional, nonlinear systems. Filters used in practice, such as the ensemble Kalman filter (EnKF), are biased for nonlinear systems and have numerous tuning parameters. Here, we present a framework for learning a parameterized analysis map-the map that takes a forecast distribution and observations to the filtering distribution-using variational inference. We show that this methodology can be used to learn gain matrices for filtering linear and nonlinear dynamical systems, as well as inflation and localization parameters for an EnKF. Future work will apply this framework to learn new filtering algorithms.
- PNASCodiscovering graphical structure and functional relationships within data: A Gaussian Process framework for connecting the dotsThéo Bourdais, Pau Batlle, Xianjin Yang, Ricardo Baptista, Nicolas Rouquette, and Houman OwhadiProceedings of the National Academy of Sciences, 2024
Many complex data analysis problems within and beyond the scientific domain involve discovering graphical structures and functional relationships within data. Nonlinear variance decomposition with Gaussian Processes simplifies and automates this process. Other methods, such as artificial neural networks, lack this variance decomposition feature. Information-theoretic and causal inference methods suffer from super-exponential complexity with respect to the number of variables. The proposed technique performs this task in polynomial complexity. This unlocks the potential for applications involving the identification of a network of hidden relationships between variables without a parameterized model at a remarkable scale, scope, and complexity. Most problems within and beyond the scientific domain can be framed into one of the following three levels of complexity of function approximation. Type 1: Approximate an unknown function given input/output data. Type 2: Consider a collection of variables and functions, some of which are unknown, indexed by the nodes and hyperedges of a hypergraph (a generalized graph where edges can connect more than two vertices). Given partial observations of the variables of the hypergraph (satisfying the functional dependencies imposed by its structure), approximate all the unobserved variables and unknown functions. Type 3: Expanding on Type 2, if the hypergraph structure itself is unknown, use partial observations of the variables of the hypergraph to discover its structure and approximate its unknown functions. These hypergraphs offer a natural platform for organizing, communicating, and processing computational knowledge. While most scientific problems can be framed as the data-driven discovery of unknown functions in a computational hypergraph whose structure is known (Type 2), many require the data-driven discovery of the structure (connectivity) of the hypergraph itself (Type 3). We introduce an interpretable Gaussian Process (GP) framework for such (Type 3) problems that does not require randomization of the data, access to or control over its sampling, or sparsity of the unknown functions in a known or learned basis. Its polynomial complexity, which contrasts sharply with the super-exponential complexity of causal inference methods, is enabled by the nonlinear ANOVA capabilities of GPs used as a sensing mechanism.
- AMSAn approximation theory framework for measure-transport sampling algorithmsRicardo Baptista, Bamdad Hosseini, Nikola B Kovachki, Youssef M Marzouk, and Amir SagivMathematics of Computation, 2024
This article presents a general approximation-theoretic framework to analyze measure transport algorithms for probabilistic modeling. A primary motivating application for such algorithms is sampling – a central task in statistical inference and generative modeling. We provide a priori error estimates in the continuum limit, i.e., when the measures (or their densities) are given, but when the transport map is discretized or approximated using a finite-dimensional function space. Our analysis relies on the regularity theory of transport maps and on classical approximation theory for high-dimensional functions. A third element of our analysis, which is of independent interest, is the development of new stability estimates that relate the distance between two maps to the distance (or divergence) between the pushforward measures they define. We present a series of applications of our framework, where quantitative convergence rates are obtained for practical problems using Wasserstein metrics, maximum mean discrepancy, and Kullback–Leibler divergence. Specialized rates for approximations of the popular triangular Knöthe-Rosenblatt maps are obtained, followed by numerical experiments that demonstrate and extend our theory.
- JCPBayesian model calibration for block copolymer self-assembly: Likelihood-free inference and expected information gain computation via measure transportRicardo Baptista, Lianghao Cao, Joshua Chen, Omar Ghattas, Fengyi Li, Youssef M Marzouk, and J Tinsley OdenJournal of Computational Physics, 2024
We consider the Bayesian calibration of models describing the phenomenon of block copolymer (BCP) self-assembly using image data produced by microscopy or X-ray scattering techniques. To account for the random long-range disorder in BCP equilibrium structures, we introduce auxiliary variables to represent this aleatory uncertainty. These variables, however, result in an integrated likelihood for high-dimensional image data that is generally intractable to evaluate. We tackle this challenging Bayesian inference problem using a likelihood-free approach based on measure transport together with the construction of summary statistics for the image data. We also show that expected information gains (EIGs) from the observed data about the model parameters can be computed with no significant additional cost. Lastly, we present a numerical case study based on the Ohta–Kawasaki model for diblock copolymer thin film self-assembly and top-down microscopy characterization. For calibration, we introduce several domain-specific energy- and Fourier-based summary statistics, and quantify their informativeness using EIG. We demonstrate the power of the proposed approach to study the effect of data corruptions and experimental designs on the calibration results.
- arXivNeural Approximate Mirror Maps for Constrained Diffusion ModelsBerthy T Feng, Ricardo Baptista, and Katherine L BoumanarXiv:2406.12816, 2024
Diffusion models excel at creating visually-convincing images, but they often struggle to meet subtle constraints inherent in the training data. Such constraints could be physics-based (e.g., satisfying a PDE), geometric (e.g., respecting symmetry), or semantic (e.g., including a particular number of objects). When the training data all satisfy a certain constraint, enforcing this constraint on a diffusion model not only improves its distribution-matching accuracy but also makes it more reliable for generating valid synthetic data and solving constrained inverse problems. However, existing methods for constrained diffusion models are inflexible with different types of constraints. Recent work proposed to learn mirror diffusion models (MDMs) in an unconstrained space defined by a mirror map and to impose the constraint with an inverse mirror map, but analytical mirror maps are challenging to derive for complex constraints. We propose neural approximate mirror maps (NAMMs) for general constraints. Our approach only requires a differentiable distance function from the constraint set. We learn an approximate mirror map that pushes data into an unconstrained space and a corresponding approximate inverse that maps data back to the constraint set. A generative model, such as an MDM, can then be trained in the learned mirror space and its samples restored to the constraint set by the inverse map. We validate our approach on a variety of constraints, showing that compared to an unconstrained diffusion model, a NAMM-based MDM substantially improves constraint satisfaction. We also demonstrate how existing diffusion-based inverse-problem solvers can be easily applied in the learned mirror space to solve constrained inverse problems.
- arXivTrIM: Transformed Iterative Mondrian Forests for Gradient-based Dimension Reduction and High-Dimensional RegressionRicardo Baptista, Eliza O’Reilly, and Yangxinyu XiearXiv:2407.09964, 2024
We propose a computationally efficient algorithm for gradient-based linear dimension reduction and high-dimensional regression. The algorithm initially computes a Mondrian forest and uses this estimator to identify a relevant feature subspace of the inputs from an estimate of the expected gradient outer product (EGOP) of the regression function. In addition, we introduce an iterative approach known as Transformed Iterative Mondrian (TrIM) forest to improve the Mondrian forest estimator by using the EGOP estimate to update the set of features and weights used by the Mondrian partitioning mechanism. We obtain consistency guarantees and convergence rates for the estimation of the EGOP matrix and the random forest estimator obtained from one iteration of the TrIM algorithm. Lastly, we demonstrate the effectiveness of our proposed algorithm for learning the relevant feature subspace across a variety of settings with both simulated and real data.
- JMLRLearning non-Gaussian graphical models via Hessian scores and triangular transportRicardo Baptista, Youssef Marzouk, Rebecca E Morrison, and Olivier ZahmJournal of Machine Learning Research, 2024
Undirected probabilistic graphical models represent the conditional dependencies, or Markov properties, of a collection of random variables. Knowing the sparsity of such a graphical model is valuable for modeling multivariate distributions and for efficiently performing inference. While the problem of learning graph structure from data has been studied extensively for certain parametric families of distributions, most existing methods fail to consistently recover the graph structure for non-Gaussian data. Here we propose an algorithm for learning the Markov structure of continuous and non-Gaussian distributions. To characterize conditional independence, we introduce a score based on integrated Hessian information from the joint log-density, and we prove that this score upper bounds the conditional mutual information for a general class of distributions. To compute the score, our algorithm SING estimates the density using a deterministic coupling, induced by a triangular transport map, and iteratively exploits sparse structure in the map to reveal sparsity in the graph. For certain non-Gaussian datasets, we show that our algorithm recovers the graph structure even with a biased approximation to the density. Among other examples, we apply SING to learn the dependencies between the states of a chaotic dynamical system with local interactions.
- arXivCoupled Input-Output Dimension Reduction: Application to Goal-oriented Bayesian Experimental Design and Global Sensitivity AnalysisQiao Chen, Elise Arnaud, Ricardo Baptista, and Olivier ZahmarXiv:2406.13425, 2024
We introduce a new method to jointly reduce the dimension of the input and output space of a high-dimensional function. Choosing a reduced input subspace influences which output subspace is relevant and vice versa. Conventional methods focus on reducing either the input or output space, even though both are often reduced simultaneously in practice. Our coupled approach naturally supports goal-oriented dimension reduction, where either an input or output quantity of interest is prescribed. We consider, in particular, goal-oriented sensor placement and goal-oriented sensitivity analysis, which can be viewed as dimension reduction where the most important output or, respectively, input components are chosen. Both applications present difficult combinatorial optimization problems with expensive objectives such as the expected information gain and Sobol indices. By optimizing gradient-based bounds, we can determine the most informative sensors and most sensitive parameters as the largest diagonal entries of some diagnostic matrices, thus bypassing the combinatorial optimization and objective evaluation.
- JUQConditional Sampling with Monotone GANs: from Generative Models to Likelihood-Free InferenceRicardo Baptista, Bamdad Hosseini, Nikola B Kovachki, and Youssef MarzoukSIAM/ASA Journal on Uncertainty Quantification, 2024
We present a novel framework for conditional sampling of probability measures, using block triangular transport maps. We develop the theoretical foundations of block triangular transport in a Banach space setting, establishing general conditions under which conditional sampling can be achieved and drawing connections between monotone block triangular maps and optimal transport. Based on this theory, we then introduce a computational approach, called monotone generative adversarial networks (M-GANs), to learn suitable block triangular maps. Our algorithm uses only samples from the underlying joint probability measure and is hence likelihood-free. Numerical experiments with M-GAN demonstrate accurate sampling of conditional measures in synthetic examples, Bayesian inverse problems involving ordinary and partial differential equations, and probabilistic image in-painting.
2023
- arXivEfficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian InferenceZheyu Oliver Wang, Ricardo Baptista, Youssef Marzouk, Lars Ruthotto, and Deepanshu VermaarXiv:2310.16975, 2023
We present two neural network approaches that approximate the solutions of static and dynamic conditional optimal transport (COT) problems. Both approaches enable conditional sampling and conditional density estimation, which are core tasks in Bayesian inference–particularly in the simulation-based ("likelihood-free") setting. Our methods represent the target conditional distributions as transformations of a tractable reference distribution and, therefore, fall into the framework of measure transport. Although many measure transport approaches model the transformation as COT maps, obtaining the map is computationally challenging, even in moderate dimensions. To improve scalability, our numerical algorithms use neural networks to parameterize COT maps and further exploit the structure of the COT problem. Our static approach approximates the map as the gradient of a partially input-convex neural network. It uses a novel numerical implementation to increase computational efficiency compared to state-of-the-art alternatives. Our dynamic approach approximates the conditional optimal transport via the flow map of a regularized neural ODE; compared to the static approach, it is slower to train but offers more modeling choices and can lead to faster sampling. We demonstrate both algorithms numerically, comparing them with competing state-of-the-art approaches, using benchmark datasets and simulation-based Bayesian inverse problems.
- FoCMOn the representation and learning of monotone triangular transport mapsRicardo Baptista, Youssef Marzouk, and Olivier ZahmFoundations of Computational Mathematics, 2023
Transportation of measure provides a versatile approach for modeling complex probability distributions, with applications in density estimation, Bayesian inference, generative modeling, and beyond. Monotone triangular transport maps—approximations of the Knothe–Rosenblatt (KR) rearrangement—–are a canonical choice for these tasks. Yet the representation and parameterization of such maps have a significant impact on their generality and expressiveness, and on properties of the optimization problem that arises in learning a map from data (e.g., via maximum likelihood estimation). We present a general framework for representing monotone triangular maps via invertible transformations of smooth functions. We establish conditions on the transformation such that the associated infinite-dimensional minimization problem has no spurious local minima, i.e., all local minima are global minima; and we show for target distributions satisfying certain tail conditions that the unique global minimizer corresponds to the KR map. Given a sample from the target, we then propose an adaptive algorithm that estimates a sparse semi-parametric approximation of the underlying KR map. We demonstrate how this framework can be applied to joint and conditional density estimation, likelihood-free inference, and structure learning of directed graphical models, with stable generalization performance across a range of sample sizes.
- NeurIPSDebias Coarsely, Sample Conditionally: Statistical Downscaling through Optimal Transport and Probabilistic Diffusion ModelsZhong Yi Wan, Ricardo Baptista, Yi-fan Chen, John Anderson, Anudhyan Boral, Fei Sha, and Leonardo Zepeda-NúñezIn Advances in Neural Information Processing Systems, 2023
We introduce a two-stage probabilistic framework for statistical downscaling using unpaired data. Statistical downscaling seeks a probabilistic map to transform low- resolution data from a biased coarse-grained numerical scheme to high-resolution data that is consistent with a high-fidelity scheme. Our framework tackles the problem by composing two transformations: (i) a debiasing step via an optimal transport map, and (ii) an upsampling step achieved by a probabilistic diffusion model with a posteriori conditional sampling. This approach characterizes a con- ditional distribution without needing paired data, and faithfully recovers relevant physical statistics from biased samples. We demonstrate the utility of the proposed approach on one- and two-dimensional fluid flow problems, which are representa- tive of the core difficulties present in numerical simulations of weather and climate. Our method produces realistic high-resolution outputs from low-resolution inputs, by upsampling resolutions of 8× and 16×. Moreover, our procedure correctly matches the statistics of physical quantities, even when the low-frequency content of the inputs and outputs do not match, a crucial but difficult-to-satisfy assumption needed by current state-of-the-art alternatives.
- NeurIPS OTMLA generative flow model for conditional sampling via optimal transportJason Alfonso, Ricardo Baptista, Anupam Bhakta, Noam Gal, Alfin Hou, Isa Lyubimova, Daniel Pocklington, Josef Sajonz, Giulio Trigila, and Ryan TsaiIn NeurIPS Optimal Transport and Machine Learning Workshop, 2023
Sampling conditional distributions is a fundamental task for Bayesian inference and density estimation. Generative models characterize conditionals by learning a transport map that pushes forward a reference (e.g., a standard Gaussian) to the target distribution. While these approaches can successfully describe many non- Gaussian problems, their performance is often limited by parametric bias and the reliability of gradient-based (adversarial) optimizers to learn the map. This work proposes a non-parametric generative model that adaptively maps reference samples to the target. The model uses block-triangular transport maps, whose components characterize conditionals of the target distribution. These maps arise from solving an optimal transport problem with a weighted L^2 cost function, thereby extending the data-driven approach in Trigila and Tabak for conditional sampling. The proposed approach is demonstrated on a low-dimensional example and a parameter inference problem involving nonlinear ODEs.
- JCPEnsemble transport smoothing.–Part 1: Unified frameworkMaximilian Ramgraber, Ricardo Baptista, Dennis McLaughlin, and Youssef MarzoukJournal of Computational Physics: X, 2023
Smoothers are algorithms for Bayesian time series re-analysis. Most operational smoothers rely either on affine Kalman-type transformations or on sequential importance sampling. These strategies occupy opposite ends of a spectrum that trades computational efficiency and scalability for statistical generality and consistency: non-Gaussianity renders affine Kalman updates inconsistent with the true Bayesian solution, while the ensemble size required for successful importance sampling can be prohibitive. This paper revisits the smoothing problem from the perspective of measure transport, which offers the prospect of consistent prior-to-posterior transformations for Bayesian inference. We leverage this capacity by proposing a general ensemble framework for transport-based smoothing. Within this framework, we derive a comprehensive set of smoothing recursions based on nonlinear transport maps and detail how they exploit the structure of state-space models in fully non-Gaussian settings. We also describe how many standard Kalman-type smoothing algorithms emerge as special cases of our framework. A companion paper explores the implementation of nonlinear ensemble transport smoothers in greater depth.
- JCPEnsemble transport smoothing.–Part 2: Nonlinear updatesMaximilian Ramgraber, Ricardo Baptista, Dennis McLaughlin, and Youssef MarzoukJournal of Computational Physics: X, 2023
Smoothing is a specialized form of Bayesian inference for state-space models that characterizes the posterior distribution of a collection of states given an associated sequence of observations. Ramgraber et al. [38] proposes a general framework for transport-based ensemble smoothing, which includes linear Kalman-type smoothers as special cases. Here, we build on this foundation to realize and demonstrate nonlinear backward ensemble transport smoothers. We discuss parameterization and regularization of the associated transport maps, and then examine the performance of these smoothers for nonlinear and chaotic dynamical systems that exhibit non-Gaussian behavior. In these settings, our nonlinear transport smoothers yield lower estimation error than conventional linear smoothers and state-of-the-art iterative ensemble Kalman smoothers, for comparable numbers of model evaluations.
- arXivAn adaptive ensemble filter for heavy-tailed distributions: tuning-free inflation and localizationMathieu Le Provost, Ricardo Baptista, Youssef Marzouk, and Jeff D EldredgearXiv:2310.08741, 2023
Heavy tails is a common feature of filtering distributions that results from the nonlinear dynamical and observation processes as well as the uncertainty from physical sensors. In these settings, the Kalman filter and its ensemble version — the ensemble Kalman filter (EnKF) — that have been designed under Gaussian assumptions result in degraded performance. t–distributions are a parametric family of distributions whose tail-heaviness is modulated by a degree of freedom ν. Interestingly, Cauchy and Gaussian distributions correspond to the extreme cases of a t–distribution for ν= 1 and ν= ∞, respectively. Leveraging tools from measure transport (Spantini et al., SIAM Review, 2022), we present a generalization of the EnKF whose prior-to-posterior update leads to exact inference for t–distributions. We demonstrate that this filter is less sensitive to outlying synthetic observations generated by the observation model for small ν. Moreover, it recovers the Kalman filter for ν= ∞. For nonlinear state-space models with heavy-tailed noise, we propose an algorithm to estimate the prior-to-posterior update from samples of joint forecast distribution of the states and observations. We rely on a regularized expectation-maximization (EM) algorithm to estimate the mean, scale matrix, and degree of freedom of heavy-tailed t–distributions from limited samples (Finegold and Drton, arXiv preprint, 2014). Leveraging the conditional independence of the joint forecast distribution, we regularize the scale matrix with an l1 sparsity-promoting penalization of the log-likelihood at each iteration of the EM algorithm. This l1 regularization draws upon the graphical lasso algorithm (Friedman et al., Biostatistics, 2008) to estimate sparse covariance matrix in the Gaussian setting. By sequentially estimating the degree of freedom at each analysis step, our filter has the appealing feature of adapting the prior-to-posterior update to the tail-heaviness of the data. This new filter intrinsically embeds an adaptive and data-dependent multiplicative inflation mechanism complemented with an adaptive localization through the l1-penalization of the estimated scale matrix. We demonstrate the benefits of this new ensemble filter on challenging filtering problems with heavy-tailed noise.
- IEEE CSSComputational Optimal Transport and Filtering on Riemannian manifoldsDaniel Grange, Mohammad Al-Jarrah, Ricardo Baptista, Amirhossein Taghvaei, Tryphon T Georgiou, and Allen TannenbaumIEEE Control Systems Letters, 2023
In this paper we extend recent developments in computational optimal transport to the setting of Riemannian manifolds. In particular, we show how to learn optimal transport maps from samples that relate probability distributions defined on manifolds. Specializing these maps for sampling conditional probability distributions provides an ensemble approach for solving nonlinear filtering problems defined on such geometries. The proposed computational methodology is illustrated with examples of transport and nonlinear filtering on Lie groups, including the circle S^1, the special Euclidean group SE(2), and the special orthogonal group SO(3).
- NeurIPSStructured Neural Networks for Density Estimation and Causal InferenceAsic Q Chen, Ruian Shi, Xiang Gao, Ricardo Baptista, and Rahul G KrishnanIn Advances in Neural Information Processing Systems, 2023
Injecting structure into neural networks enables learning functions that satisfy invariances with respect to subsets of inputs. For instance, when learning generative models using neural networks, it is advantageous to encode the conditional independence structure of observed variables, often in the form of Bayesian networks. We propose the Structured Neural Network (StrNN), which injects structure through masking pathways in a neural network. The masks are designed via a novel relationship we explore between neural network architectures and binary matrix factorization, to ensure that the desired independencies are respected. We devise and study practical algorithms for this otherwise NP-hard design problem based on novel objectives that control the model architecture. We demonstrate the utility of StrNN in three applications: (1) binary and Gaussian density estimation with StrNN, (2) real-valued density estimation with Structured Autoregressive Flows (StrAFs) and Structured Continuous Normalizing Flows (StrCNF), and (3) interventional and counterfactual analysis with StrAFs for causal inference. Our work opens up new avenues for learning neural networks that enable data-efficient generative modeling and the use of normalizing flows for causal effect estimation.
- arXivScore-based diffusion models in function spaceJae Hyun Lim, Nikola B Kovachki, Ricardo Baptista, Christopher Beckham, Kamyar Azizzadenesheli, Jean Kossaifi, Vikram Voleti, Jiaming Song, Karsten Kreis, Jan Kautz, and 1 more authorarXiv:2302.07400, 2023
Diffusion models have recently emerged as a powerful framework for generative modeling. They consist of a forward process that perturbs input data with Gaussian white noise and a reverse process that learns a score function to generate samples by denoising. Despite their tremendous success, they are mostly formulated on finite-dimensional spaces, e.g. Euclidean, limiting their applications to many domains where the data has a functional form such as in scientific computing and 3D geometric data analysis. In this work, we introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space. In DDOs, the forward process perturbs input functions gradually using a Gaussian process. The generative process is formulated by integrating a function-valued Langevin dynamic. Our approach requires an appropriate notion of the score for the perturbed data distribution, which we obtain by generalizing denoising score matching to function spaces that can be infinite-dimensional. We show that the corresponding discretized algorithm generates accurate samples at a fixed cost that is independent of the data resolution. We theoretically and numerically verify the applicability of our approach on a set of problems, including generating solutions to the Navier-Stokes equation viewed as the push-forward distribution of forcings from a Gaussian Random Field (GRF).
2022
- SIAM ReviewCoupling techniques for nonlinear ensemble filteringAlessio Spantini, Ricardo Baptista, and Youssef MarzoukSIAM Review, 2022
We consider filtering in high-dimensional non-Gaussian state-space models with intractable transition kernels, nonlinear and possibly chaotic dynamics, and sparse observations in space and time. We propose a novel filtering methodology that harnesses transportation of measures, convex optimization, and ideas from probabilistic graphical models to yield robust ensemble approximations of the filtering distribution in high dimensions. Our approach can be understood as the natural generalization of the ensemble Kalman filter (EnKF) to nonlinear updates, using stochastic or deterministic couplings. The use of nonlinear updates can reduce the intrinsic bias of the EnKF at a marginal increase in computational cost. We avoid any form of importance sampling and introduce non-Gaussian localization approaches for dimension scalability. Our framework achieves state-of-the-art tracking performance on challenging configurations of the Lorenz-96 model in the chaotic regime.
- NeurIPS SBMDimension reduction via score ratio matchingMichael Brennan, Ricardo Baptista, and Youssef MarzoukIn NeurIPS 2022 Workshop on Score-Based Methods, 2022
We propose a method to detect a low-dimensional subspace where a non-Gaussian target distribution departs from a known reference distribution (e.g., a standard Gaussian). We identify this subspace from gradients of the log-ratio between the target and reference densities, which we call the score ratio. Given only samples from the target distribution, we estimate these gradients via score ratio matching, with a tailored parameterization and a regularization method that expose the low- dimensional structure we seek. We show that our approach outperforms standard score matching for dimension reduction of in-class distributions, and that several benchmark UCI datasets in fact exhibit this type of low dimensionality.
- JMADiagonal nonlinear transformations preserve structure in covariance and precision matricesRebecca Morrison, Ricardo Baptista, and Estelle BasorJournal of Multivariate Analysis, 2022
For a multivariate normal distribution, the sparsity of the covariance and precision matrices encodes complete information about independence and conditional independence properties. For general distributions, the covariance and precision matrices reveal correlations and so-called partial correlations between variables, but these do not, in general, have any correspondence with respect to independence properties. In this paper, we prove that, for a certain class of non-Gaussian distributions, these correspondences still hold, exactly for the covariance and approximately for the precision. The distributions – sometimes referred to as “nonparanormal” – are given by diagonal transformations of multivariate normal random variables. We provide several analytic and numerical examples illustrating these results.
- arXivGradient-based data and parameter dimension reduction for Bayesian models: an information theoretic perspectiveRicardo Baptista, Youssef Marzouk, and Olivier ZahmarXiv:2207.08670, 2022
We consider the problem of reducing the dimensions of parameters and data in non-Gaussian Bayesian inference problems. Our goal is to identify an "informed" subspace of the parameters and an “informative” subspace of the data so that a high-dimensional inference problem can be approximately reformulated in low-to-moderate dimensions, thereby improving the computational efficiency of many inference techniques. To do so, we exploit gradient evaluations of the log-likelihood function. Furthermore, we use an information-theoretic analysis to derive a bound on the posterior error due to parameter and data dimension reduction. This bound relies on logarithmic Sobolev inequalities, and it reveals the appropriate dimensions of the reduced variables. We compare our method with classical dimension reduction techniques, such as principal component analysis and canonical correlation analysis, on applications ranging from mechanics to image processing.
- PRSAA low-rank ensemble Kalman filter for elliptic observationsMathieu Le Provost, Ricardo Baptista, Youssef Marzouk, and Jeff D EldredgeProceedings of the Royal Society A, 2022
We propose a regularization method for ensemble Kalman filtering (EnKF) with elliptic observation operators. Commonly used EnKF regularization methods suppress state correlations at long distances. For observations described by elliptic partial differential equations, such as the pressure Poisson equation (PPE) in incompressible fluid flows, distance localization cannot be applied, as we cannot disentangle slowly decaying physical interactions from spurious long-range correlations. This is particularly true for the PPE, in which distant vortex elements couple nonlinearly to induce pressure. Instead, these inverse problems have a low effective dimension: low-dimensional projections of the observations strongly inform a low-dimensional subspace of the state space. We derive a low-rank factorization of the Kalman gain based on the spectrum of the Jacobian of the observation operator. The identified eigenvectors generalize the source and target modes of the multipole expansion, independently of the underlying spatial distribution of the problem. Given rapid spectral decay, inference can be performed in the low-dimensional subspace spanned by the dominant eigenvectors. This low-rank EnKF is assessed on dynamical systems with Poisson observation operators, where we seek to estimate the positions and strengths of point singularities over time from potential or pressure observations. We also comment on the broader applicability of this approach to elliptic inverse problems outside the context of filtering.
2021
- AIAAA low-rank nonlinear ensemble filter for vortex models of aerodynamic flowsMathieu Le Provost, Ricardo Baptista, Youssef Marzouk, and Jeff EldredgeIn AIAA Scitech 2021 Forum, 2021
Robustly estimating the separated flow about an airfoil is critical in the design of any closed-loop controller. Darakananda et al. (Phys. Rev. Fluids, 2018) successfully used an ensemble Kalman filter (EnKF) to sequentially estimate the flow using an inviscid vortex model and distributed surface pressure readings. To tackle challenging inference problems with limited observations, classical localization schemes suppress correlations at long distances. However, these techniques would be harmful in our case due to the existence of physical long-range interactions between vortices and pressure readings. Instead, these interactions are best described as interactions between clusters of variables. This work proposes a systematic procedure to identify these clusters of variables from a nonlinear observation model. By projecting the states and observations onto these new sets of variables, the inference is performed in a low-dimensional subspace of the state and the observations. To perform consistent inference with the nonlinear model, we use the stochastic map filter (SMF): a natural generalization of the EnKF that relies on interpretable nonlinear prior-to-posterior transformations (Spantini et al., arXiv, 2019). We combine the identification of these clusters of variables with the SMF to derive a low-rank nonlinear ensemble filter. This filter is assessed on the response of a translating plate at 20 degrees that undergoes strong and overlapping pulses applied near the leading-edge. Our framework outperforms the EnKF at estimating the surface pressure distribution along the entire plate, with only two pressure sensors (placed at the edges of the plate) for collecting measurements. Keywords: inviscid vortex model, disturbed separated flow, data assimilation, nonlinear ensemble filter, measure transport, low-rank projections
2020
- SEGBayesian seismic inversion: Measuring Langevin MCMC sample quality with kernelsMuhammad Izzatullah, Ricardo Baptista, Lester Mackey, Youssef Marzouk, and Daniel PeterIn SEG International Exposition and Annual Meeting, 2020
The Bayesian framework is commonly used to quantify uncertainty in seismic inversion. To perform Bayesian inference, Markov chain Monte Carlo (MCMC) algorithms are regarded as the gold standard technique for sampling from the posterior probability distribution. Consistent MCMC methods have trouble for complex, high-dimensional models, and most methods scale poorly to large datasets, such as those arising in seismic inversion. As an alternative, approximate MCMC methods based on unadjusted Langevin dynamics offer scalability and more rapid sampling at the cost of biased inference. However, when assessing the quality of approximate MCMC samples for characterizing the posterior distribution, most diagnostics fail to account for these biases. In this work, we introduce the kernel Stein discrepancy (KSD) as a diagnostic tool to determine the convergence of MCMC samples for Bayesian seismic inversion. We demonstrate the use of the KSD for measuring sample quality and selecting the optimal Langevin MCMC algorithm for two Gaussian Bayesian inference problems.
2019
- JCPSome greedy algorithms for sparse polynomial chaos expansionsRicardo Baptista, Valentin Stolbunov, and Prasanth B NairJournal of Computational Physics, 2019
Compressed sensing algorithms approximate functions using limited evaluations by searching for a sparse representation among a dictionary of basis functions. Orthogonal matching pursuit (OMP) is a greedy algorithm for selecting basis functions whose computational cost scales with the size of the dictionary. For polynomial chaos (PC) approximations, the size of the dictionary grows quickly with the number of model inputs and the maximum polynomial degree, making them often prohibitive to use with greedy methods. We propose two new algorithms for efficiently constructing sparse PC expansions for problems with high-dimensional inputs. The first algorithm is a parallel OMP method coupled with an incremental QR factorization scheme, wherein the model construction step is interleaved with a ν-fold cross-validation procedure. The second approach is a randomized greedy algorithm that leverages a probabilistic argument to only evaluate a subset of basis functions from the dictionary at each iteration of the incremental algorithm. The randomized algorithm is demonstrated to recover model outputs with a similar level of sparsity and accuracy as OMP, but with a cost that is independent of the dictionary size. Both algorithms are validated with a numerical comparison of their performance on a series of algebraic test problems and PDEs with high-dimensional inputs.
2018
- ICMLBayesian optimization of combinatorial structuresRicardo Baptista, and Matthias PoloczekIn International Conference on Machine Learning, 2018
The optimization of expensive-to-evaluate black-box functions over combinatorial structures is an ubiquitous task in machine learning, engineering and the natural sciences. The combinatorial explosion of the search space and costly evaluations pose challenges for current techniques in discrete optimization and machine learning, and critically require new algorithmic ideas. This article proposes, to the best of our knowledge, the first algorithm to overcome these challenges, based on an adaptive, scalable model that identifies useful combinatorial structure even when data is scarce. Our acquisition function pioneers the use of semidefinite programming to achieve efficiency and scalability. Experimental evaluations demonstrate that this algorithm consistently outperforms other methods from combinatorial and Bayesian optimization.
- AIAAOptimal approximations of coupling in multidisciplinary modelsRicardo Baptista, Youssef Marzouk, Karen Willcox, and Benjamin PeherstorferAIAA Journal, 2018
This paper presents a methodology for identifying important discipline couplings in multicomponent engineering systems. Coupling among disciplines contributes significantly to the computational cost of analyzing a system and can become particularly burdensome when coupled analyses are embedded within a design or optimization loop. In many cases, disciplines may be weakly coupled, so that some of the coupling or interaction terms can be neglected without significantly impacting the accuracy of the system output. Typical practice derives such approximations in an ad hoc manner using expert opinion and domain experience. This work proposes a new approach that formulates an optimization problem to find a model that optimally balances accuracy of the model outputs with the sparsity of the discipline couplings. An adaptive sequential Monte Carlo sampling-based technique is used to efficiently search the combinatorial model space of different discipline couplings. An algorithm for selecting an optimal model is presented and illustrated in a fire-detection satellite model and a turbine engine cycle analysis model.
2017
- NeurIPSBeyond normality: learning sparse probabilistic graphical models in the non-Gaussian settingRebecca E Morrison, Ricardo Baptista, and Youssef MarzoukIn Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017
We present an algorithm to identify sparse dependence structure in continuous and non-Gaussian probability distributions, given a corresponding set of data. The conditional independence structure of an arbitrary distribution can be represented as an undirected graph (or Markov random field), but most algorithms for learning this structure are restricted to the discrete or Gaussian cases. Our new approach allows for more realistic and accurate descriptions of the distribution in question, and in turn better estimates of its sparse Markov structure. Sparsity in the graph is of interest as it can accelerate inference, improve sampling methods, and reveal important dependencies between variables. The algorithm relies on exploiting the connection between the sparsity of the graph and the sparsity of transport maps, which deterministically couple one probability measure to another.