We provide in this page an overview of Non-Reversible Parallel Tempering (PT), Syed et al., 2021, linking it with some key parts of the code base.
Read this page if you are interested in extending Pigeons or understanding how it works under the hood. Reading this page is not required to use Pigeons, for that instead refer to the user guide.
PT augmented state space, replicas
Let $X_n$ denote a Markov chain on state space $\mathscr{X}$ with stationary distribution $\pi$. PT is a Markov chain defined on the augmented state space $\mathscr{X}^N$, hence a state has the form $\boldsymbol{X} = (X^{(1)}, X^{(2)}, \dots, X^{(N)})$. Each component of $\boldsymbol{X}$ is stored in a struct called a Replica
.
The storage of the vector of replicas $\boldsymbol{X}$, is done via the informal interface replicas
. In the context of PT running on one computer, replicas
is implemented with a Vector{Replica}
. In the context of running PT distributed across several communicating machines, replicas
is implemented via EntangledReplicas
, which stores the parts of $\boldsymbol{X}$ that are local to that machine as well as data structures required to communicate with the other machines.
Internally, PT operates on a discrete set of distributions, $\pi_1, \pi_2, \dots \pi_N$, where $N$ can be obtained using n_chains()
. We use the terminology chain to refer to an index $i$ of $\pi_i$. Typically, $\pi_N$ coincides with the distribution of interest $\pi$ (called the "target"), while $\pi_1$ is a tractable approximation that will help PT efficiently explore the state space (called the "reference"). More broadly, we assume a subset of the chains (determined by is_target()
) coincide with the target, and that a subset of the chains (determined by is_reference()
) support efficient exploration such as i.i.d. sampling or a rapid mixing kernel.
PT is designed so that its stationary distribution is $\boldsymbol{\pi} = \pi_1 \times \pi_2 \times \dots \pi_N$. As a result, subsetting each sample to its component corresponding to $\pi_N = \pi$, and applying an integrable function $f$ to each, will lead under weak assumptions to Monte Carlo averages that converge to the expectation of interest $E[f(X)]$ for $X \sim \pi$.
Outline of local exploration and communication
PT alternates between two phases, each $\boldsymbol{\pi}$-invariant: the local exploration phase and the communication phase. Informally, the first phase attempts to achieve mixing for the univariate statistics $\pi_i(X^{(i)})$, while the second phase attempts to translate well-mixing of these univariate statistics into global mixing of $X^{(i)}$ by leveraging the reference distribution(s).
Local exploration
In the local exploration phase, each Replica
's state is modified using a $\pi_i$-invariant kernel, where $i$ is given by Replica.chain
. Often, Replica.chain
corresponds to an annealing parameter $\beta_i$ but this need not be the case (see e.g. Baragatti et al., 2011). The kernel can either modify Replica.state
in-place, or modify the Replica
's state
field. The key interface controlling local exploration, explorer
, is described in more detail below.
Communication
In the communication phase, PT proposes swaps between pairs of replicas. These swaps allow each replica's state to periodically visit reference chains. During these reference visits, the state can move around the space quickly. In principle, there are two equivalent ways to do a swap: the Replica
s could exchange their state
fields; or alternatively, they could exchange their chain
fields. Since we provide distributed implementations, we use the latter as it ensures that the amount of data that needs to be exchanged between two machines during a swap can be made very small (two floats). It is remarkable that this cost does not vary with the dimensionality of the state space, in constrast to the naive implementation which would transmit states over the network. See Distributed PT for more information on our distributed implementation.
Both in distributed and single process mode, swaps are performed using the function swap!()
.
The key interface controlling communication, tempering
, is described in more detail below.
A tour of the PT meta-algorithm
A generalized version of Algorithm 1 ("one round of PT") in Syed et al., 2021 is implemented in Pigeons in run_one_round!()
, while the complete algorithm ("several adaptive rounds"), Algorithm 4 of Syed et al., 2021, has a generalized implementation in pigeons()
.
In the following we discuss different facets of these (meta-)algorithms.
Storage in PT algorithms
The information stored in the execution of pigeons()
is grouped in the struct PT
. The key fields are one pointing to a replicas
and one to a Shared
. Briefly, replicas
will store information distinct in each MPI process, and read-write during each round, while Shared
is identical in all MPI processes, read only during a round, and updated only between rounds.
To orchestrate the creation of PT
structs, Inputs
is used. Inputs fully determines the execution of a PT algorithm (target distribution, random seed, etc).
Collecting statistics: recorder
and recorders
Two steps are needed to collect statistics from the execution of a PT algorithm:
- Specifying which statistics to collect using one or several
recorder_builder
(e.g. by default, only some statistics that can be computed in constant memory are included, those that have growing memory consumption, e.g. tracking the full index process as done here, need to be explicitly specified in advance). - Then at the end of
run_one_round!()
,reduce_recorders!()
is called to compile the statistics collected by the different replicas.
An object responsible for accumulating all different types of statistics for one replica is called a recorders
. An object accumulating one type of statistic for one replica is a recorder
. Each replica has a single recorders to ensure thread safety (e.g., see the use of a parallel local exploration phase using @thread
in explore!()
) and to enable distributed computing.
Using a built-in recorder
To see the list of built-in implementations of recorder
, see the section "Examples of functions.." at recorder
.
To specify you want to use one recorder
, specify it in the Vector argument recorder_builders
in Inputs
. For example, to signal you want to save the full index process, use:
using Pigeons
pt = pigeons(target = toy_mvn_target(1), record = [index_process]);
┌ Info: Neither traces, disk, nor online recorders included.
│ You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
└ To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
──────────────────────────────────────────────────────
scans Λ log(Z₁/Z₀) min(α) mean(α)
────────── ────────── ────────── ────────── ──────────
2 0.417 -1.56 0.782 0.954
4 0.564 -0.653 0.61 0.937
8 0.597 -0.833 0.869 0.934
16 0.686 -1.35 0.729 0.924
32 0.818 -1.19 0.8 0.909
64 0.698 -1.23 0.838 0.922
128 0.751 -1.17 0.844 0.917
256 0.722 -1.15 0.885 0.92
512 0.757 -1.19 0.895 0.916
1.02e+03 0.744 -1.14 0.908 0.917
──────────────────────────────────────────────────────
You can then access the index process via
pt.reduced_recorders.index_process
Dict{Int64, Vector{Int64}} with 10 entries:
5 => [9, 10, 10, 9, 8, 7, 6, 5, 4, 3 … 3, 3, 2, 1, 1, 2, 3, 4, 5, 5]
4 => [8, 7, 6, 5, 4, 3, 2, 1, 1, 2 … 6, 5, 4, 4, 5, 6, 7, 8, 9, 9]
6 => [6, 5, 4, 3, 2, 1, 1, 2, 3, 4 … 9, 10, 10, 9, 8, 7, 6, 5, 4, 3]
7 => [5, 6, 7, 8, 9, 10, 10, 10, 10, 9 … 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
2 => [3, 4, 5, 6, 7, 8, 9, 9, 8, 7 … 8, 7, 6, 5, 4, 3, 2, 1, 1, 2]
10 => [4, 3, 2, 1, 1, 2, 3, 4, 5, 6 … 1, 2, 3, 3, 2, 1, 1, 2, 3, 4]
9 => [10, 9, 8, 7, 6, 5, 4, 3, 2, 1 … 4, 4, 5, 6, 7, 8, 9, 10, 10, 10]
8 => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 2, 1, 1, 2, 3, 4, 5, 6, 7, 8]
3 => [2, 1, 1, 2, 3, 4, 5, 6, 7, 8 … 7, 8, 9, 10, 10, 9, 8, 7, 6, 6]
1 => [7, 8, 9, 10, 10, 9, 8, 7, 6, 5 … 5, 6, 7, 8, 9, 10, 10, 9, 8, 7]
Creating your own recorder
The following pieces are needed
- Pick or create a struct
MyStruct
that will hold the information. - Implement all the methods in the section "Contract" of
recorder
making sure to type the recorder argument asrecorder::MyStruct
. Some examples are in the same source file asrecorder
and/or in the same folder asrecorder.jl
. - Create a
recorder_builder
which is simply a function such
that when called with zero argument, creates your desired type, i.e. MyStruct
. The name of this function will define the name of your recorder
.
Local explorer
Typical target distributions are expected to take care of building their own explorers, so most users are not expected to have to write their own. But for non-standard target it is useful to be able to do so.
Building a new explorer is done as follows: first, suppose you are planning to use a non-standard target of type MyTargetType
- Pick or create a struct
MyExplorerStruct
that may contain adaptation information such as step sizes for HMC or proposal bandwidth. Note that explorers will need to explore not only the target distribution $\pi$ but also the intermediate ones $\pi_i$. - Implement all the methods in the section "Contract" of
explorer
making sure to type the explorer argument asexplorer::MyExplorerStruct
. Some examples are in the same folder as the source file ofexplorer
. - Define a method
default_explorer(target::MyTargetType)
which should return a freshMyExplorerStruct
instance.
One explorer struct will be shared by all threads, so it should be read-only during execution of run_one_round!()
. It can be adapted between rounds.
Tempering
Customizing communicate!()
follows the same general steps as custom explorers, i.e.:
- Pick or create a struct
MyTemperingStruct
that may contain adaptation information such as schedule optimization. - Implement all the methods in the section "Contract" of
tempering
making sure to type the tempering argument astempering::MyTemperingStruct
. For example, seeNonReversiblePT
. - Initial construction of the tempering is done via
create_tempering()
.