Beyond Bayesian model averaging over paths in probabilistic programs with stochastic support
Abstract:
The posterior in probabilistic programs with stochastic support decomposes as a weighted sum of the local posterior distributions associated with each possible program path. We show that making predictions with this full posterior implicitly performs a Bayesian model averaging (BMA) over paths. This is potentially problematic, as BMA weights can be unstable due to model misspecification or inference approximations, leading to sub-optimal predictions in turn. To remedy this issue, we propose alternative mechanisms for path weighting: one based on stacking and one based on ideas from PAC-Bayes. We show how both can be implemented as a cheap post-processing step on top of existing inference engines. In our experiments, we find them to be more robust and lead to better predictions compared to the default BMA weights.Expectation Programming: Adapting Probabilistic Programming Systems to Estimate Expectations Efficiently
Abstract:
We show that the standard computational pipeline of probabilistic programming systems (PPSs) can be inefficient for estimating expectations and introduce the concept of expectation programming to address this. In expectation programming, the aim of the backend inference engine is to directly estimate expected return values of programs, as opposed to approximating their conditional distributions. This distinction, while subtle, allows us to achieve substantial performance improvements over the standard PPS computational pipeline by tailoring computation to the expectation we care about. We realize a particular instance of our expectation programming concept, Expectation Programming in Turing (EPT), by extending the PPS Turing to allow so-called target-aware inference to be run automatically. We then verify the statistical soundness of EPT theoretically, and show that it provides substantial empirical gains in practice.Rethinking Variational Inference for Probabilistic Programs with Stochastic Support
Abstract:
We introduce Support Decomposition Variational Inference (SDVI), a new variational inference (VI) approach for probabilistic programs with stochastic support. Existing approaches to this problem rely on designing a single global variational guide on a variable-by-variable basis, while maintaining the stochastic control flow of the original program. SDVI instead breaks the program down into sub-programs with static support, before automatically building separate sub-guides for each. This decomposition significantly aids in the construction of suitable variational families, enabling, in turn, substantial improvements in inference performance.Automating Bayesian computation for stochastic simulators with probabilistic programming
Abstract:
Probabilistic programming systems (PPSs) automate the process of running Bayesian inference in stochastic simulator models. These stochastic simulators are ubiquitous in science and engineering: climate researchers build earth system models to predict future climate change; particle physicists build simulators to understand the experimental outcomes of particle colliders; and epidemiologists build models to predict how diseases spread. PPSs give us a principled way to incorporate these simulators into our decision-making process by enabling us to calibrate them to observed data using the tools of Bayesian inference. However to do so, PPS inference algorithms need to deal with all the complexities of modern programming languages. Importantly for this thesis modern PPSs often permit the usage of stochastic control flow, leading to so-called programs with stochastic support: programs in which the number and type of latent variables are no longer fixed.
We will make the argument for treating these programs as mixtures over program paths. Using this breakdown we derive a new variational inference algorithm that we term Support Decomposition Variational Inference (SDVI). In contrast to prior work which constructs the variational family on a variable-by-variable basis, SDVI constructs the guide as a mixture over program paths, constructing a separate variational distribution for each path independently. This allows us to bring advances from variational inference from the static support setting to the stochastic support setting.
The breakdown of the program into a mixture over paths does not only help us derive new inference algorithms. We will also use it to investigate the properties of the posterior distribution more generally. Specifically, we show that the weights assigned to individual program paths can often be unstable; a problem that can arise either due to model misspecification or inference approximations. These instabilities make it harder to replicate results and can potentially give the user misleading confidence in their model's inferences. To alleviate these issues, we will propose alternative mechanisms to weight the program paths that instead optimize the path weights on predictive objectives.
Many PPSs focus on the goal of automating inference, however, it is important to also consider how the outcomes of inference are used in practice. Many workflows use the outputs of inference engines to estimate downstream expectations. To facilitate this use case, we will introduce the concept of expectation programming which allows users to directly define and estimate expectations in a target-aware manner; meaning the backend computation engine specifically tailors the estimation algorithm towards a user-specified expectation.