Pigeons

Summary

Pigeons is a Julia package to approximate challenging posterior distributions, and more broadly, Lebesgue integration problems. Pigeons can be used in a multi-threaded context, and/or distributed over hundreds or thousands of MPI-communicating machines.

Pigeons supports many different ways to specify integration/expectation problems and provides rich and configurable output.

Pigeons' core algorithm is a distributed and parallel implementation of the following algorithms:

Non-Reversible Parallel Tempering (NRPT), Syed et al., 2021.
Variational parallel tempering (Variational PT), Surjanovic et al., 2022.
autoMALA, Biron-Lattes et al., 2024.

These algorithms achieve state-of-the-art performance for approximation of challenging probability distributions.

Note

We are recruiting graduate students! Click here for more information.

Installing Pigeons

If you have not done so, install Julia. Julia 1.8 and higher are supported.
Install Pigeons using

using Pkg; Pkg.add("Pigeons")

Basic usage

Specify the target distribution and, optionally, parameters like random seed, etc by creating an Inputs:

using Pigeons

inputs = Inputs(target = toy_mvn_target(100))

Inputs{Pigeons.ScaledPrecisionNormalPath, Nothing, Nothing, Nothing, Nothing}(Pigeons.ScaledPrecisionNormalPath(1.0, 10.0, 100), 1, 10, 10, 0, nothing, nothing, false, Function[Pigeons.log_sum_ratio, Pigeons.timing_extrema, Pigeons.allocation_extrema], 0, false, nothing, nothing, true, false)

Have a look at the Inputs documentation for an overview of the many options available to configure pigeons. You will find information there on setting the random seed, controlling the number of iterations (via n_rounds), and many more options

Then, run parallel tempering (PT) locally on one process using the function pigeons():

pt = pigeons(inputs);

┌ Info: Neither traces, disk, nor online recorders included.
│    You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
└    To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
        2        7.5   2.42e-05    1.1e+04       -122   2.66e-15      0.167
        4       5.56   3.28e-05   1.18e+04       -119    2.6e-07      0.382
        8       5.82   5.33e-05   1.61e+04       -114    0.00137      0.353
       16       7.02   9.91e-05   2.15e+04       -116   0.000205      0.219
       32       7.14   0.000187   2.76e+04       -112     0.0379      0.206
       64       7.18   0.000377   3.96e+04       -116     0.0442      0.202
      128        7.2   0.000721   5.86e+04       -114      0.142        0.2
      256       7.21    0.00139   5.66e+04       -115      0.151      0.199
      512       7.14    0.00276   7.45e+04       -115       0.19      0.207
 1.02e+03       7.12     0.0054   7.02e+04       -115      0.173      0.209
────────────────────────────────────────────────────────────────────────────

This runs PT on a 100-dimensional MVN toy example with 10 chains for $2047 = 2^{11} - 1$ iterations, and returns a PT struct containing the results of this run (more later on how to access information inside a PT struct). Each line in the output provides information on a round, where the number of iteration per round doubles at each round and adaptation is performed between rounds.

Since the above two julia lines are the most common operations in this package, creating inputs and running PT can be done in one line as follows:

pt = pigeons(target = toy_mvn_target(100));

┌ Info: Neither traces, disk, nor online recorders included.
│    You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
└    To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
        2        7.5    2.2e-05    1.1e+04       -122   2.66e-15      0.167
        4       5.56   3.33e-05   1.18e+04       -119    2.6e-07      0.382
        8       5.82   5.64e-05   1.61e+04       -114    0.00137      0.353
       16       7.02   0.000101   2.15e+04       -116   0.000205      0.219
       32       7.14   0.000188   2.76e+04       -112     0.0379      0.206
       64       7.18   0.000361   3.96e+04       -116     0.0442      0.202
      128        7.2   0.000722   5.86e+04       -114      0.142        0.2
      256       7.21     0.0014   5.66e+04       -115      0.151      0.199
      512       7.14    0.00279   7.45e+04       -115       0.19      0.207
 1.02e+03       7.12    0.00541   7.02e+04       -115      0.173      0.209
────────────────────────────────────────────────────────────────────────────

where the args... passed to pigeons are forwarded to Inputs.

Continuing on the above example, to perform two additional rounds of sampling, use the following (see also: more advanced checkpoint/resume options at the checkpoint page):

pt = increment_n_rounds!(pt, 2)
pigeons(pt)

┌ Info: Neither traces, disk, nor online recorders included.
│    You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
└    To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
└ @ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 2.05e+03       7.16     0.0108   6.91e+04       -115      0.186      0.204
  4.1e+03       7.17     0.0214   6.91e+04       -115      0.184      0.203
────────────────────────────────────────────────────────────────────────────

Scope

We describe here the class of problems that can be approached using Pigeons. In summary: computational Lebesgue integration.

Let $\pi(x)$ denote a probability density called the target. In many problems, e.g. in Bayesian statistics, the density $\pi$ is typically known only up to a normalization constant,

\[\pi(x) = \frac{\gamma(x)}{Z},\]

where $\gamma$ can be evaluated pointwise, but $Z$ is unknown. Pigeons takes as input the function $\gamma$.

!!! terminology log_potential

Since we work in log-scale, we use the terminology 
`log_potential` as a shorthand for the 
unnormalized log density ``\log \gamma(x)``. 
See informal interface [`log_potential`](@ref).

Pigeons' outputs can be used for two tasks:

Approximating expectations of the form $E[f(X)]$, where $X \sim \pi$. For example, the choice $f(x) = x$ computes the mean, and $f(x) = I[x \in A]$ computes the probability of $A$ under $\pi$. See manipulating the output of pigeons
Approximating the value of the normalization constant $Z$. For example, in Bayesian statistics, this corresponds to the marginal likelihood. See approximation of the normalization constant

Pigeons shines in the following scenarios:

When the posterior density $\pi$ is challenging due to non-convexity and/or concentration on a sub-manifolds due to unidentifiability.
When the user needs not only $E[f(X)]$ but also $Z$. Many existing MCMC tools focus on the former and struggle to do the latter in high dimensional problems.
When the posterior density $\pi$ is defined over a non-standard state-space, e.g. a combinatorial object such as a phylogenetic tree. See defining custom explorers and targeting non-julian models.

How to cite Pigeons

Our team works hard to maintain and improve the Pigeons package. Please consider citing our work by referring to our Pigeons paper.

BibTeX code for citing Pigeons

@article{surjanovic2023pigeons,
  title={Pigeons.jl: {D}istributed sampling from intractable distributions},
  author={Surjanovic, Nikola and Biron-Lattes, Miguel and Tiede, Paul and Syed, Saifuddin and Campbell, Trevor and Bouchard-C{\^o}t{\'e}, Alexandre},
  journal={arXiv:2308.09769},
  year={2023}
}

APA

Surjanovic, N., Biron-Lattes, M., Tiede, P., Syed, S., Campbell, T., & Bouchard-Côté, A. (2023). Pigeons.jl: Distributed sampling from intractable distributions. arXiv:2308.09769.