Checkpoints

Pigeons can write a "checkpoint" periodically to ensure that not more than half of the work is lost in the event of e.g. a server failure. This is enabled as follows:

using Pigeons
pt = pigeons(target = toy_mvn_target(100), checkpoint = true)
PT("/home/runner/work/Pigeons.jl/Pigeons.jl/docs/build/results/all/2024-05-13-17-48-26-WS6s5kNc")

See write_checkpoint() for details of how this is accomplished in a way compatible to both the single-machine and MPI contexts. Each checkpoint is located in results/all/[unique folder]/round=[x]/checkpoint, with the latest run in results/latest/[unique folder]/round=[x]/checkpoint.

Checkpoints are also useful when an MPI-distributed PT has been ran, and the user wants to load the full set of results in one interactive session.

To load a checkpoint, create a PT struct by passing in the path string to the checkpoint folder, for example to re-load the latest checkpoint from the latest run and perform more sampling:

pt = PT("results/latest")

# do two more rounds of sampling
pt = increment_n_rounds!(pt, 2)
pt = pigeons(pt)
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 2.05e+03       7.19     0.0111   1.03e+05       -115      0.188      0.201
  4.1e+03       7.16     0.0218   7.79e+04       -115      0.184      0.205
────────────────────────────────────────────────────────────────────────────

This is useful when you realize you will need more CPUs or machines to help. Continuing on the above example, we will now do one more round, but this time using 2 local MPI processes:

pt = increment_n_rounds!(pt, 1)
result = pigeons(pt.exec_folder, ChildProcess(n_local_mpi_processes = 2))
Precompiling project...
ScientificTypesBase
NaturalSort
RangeArrays
Contour
AbstractFFTs
IterTools
IntervalSets
Distances
Grisu
LLVMOpenMP_jll
libevent_jll
Format
IntelOpenMP_jll
MPItrampoline_jll
oneTBB_jll
MicrosoftMPI_jll
Arpack_jll
FFTW_jll
StatisticalTraits
TableOperations
ZygoteRules
WebSockets
Distances → DistancesSparseArraysExt
IntervalSets → IntervalSetsRecipesBaseExt
ConstructionBase → ConstructionBaseIntervalSetsExt
PMIx_jll
AbstractFFTs → AbstractFFTsChainRulesCoreExt
MKL_jll
Arpack
Distances → DistancesChainRulesCoreExt
IntervalSets → IntervalSetsRandomExt
MLJModelInterface
prrte_jll
Accessors → AccessorsIntervalSetsExt
Latexify
AbstractFFTs → AbstractFFTsTestExt
MultivariateStats
IntervalSets → IntervalSetsStatisticsExt
OpenMPI_jll
DistributionsAD
NearestNeighbors
MCMCDiagnosticTools
AxisArrays
Latexify → DataFramesExt
DistributionsAD → DistributionsADForwardDiffExt
Clustering
FFTW
Unitful
Unitful → InverseFunctionsUnitfulExt
Unitful → ConstructionBaseUnitfulExt
Interpolations → InterpolationsUnitfulExt
Accessors → AccessorsUnitfulExt
UnitfulLatexify
KernelDensity
Bijectors → BijectorsDistributionsADExt
DynamicPPL → DynamicPPLZygoteRulesExt
Plots → UnitfulExt
MCMCChains
Pigeons → PigeonsMCMCChainsExt
DynamicPPL → DynamicPPLMCMCChainsExt
StatsPlots
  61 dependencies successfully precompiled in 52 seconds. 304 already precompiled.
Entangler initialized 2 MPI processes; 1 threads per process
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 8.19e+03       7.18      0.369   3.23e+07       -115      0.192      0.203
────────────────────────────────────────────────────────────────────────────

We conclude with an example showing that it is not necessary to load the PT object into the interactive node in order to increase the number of rounds:

new_exec_folder = increment_n_rounds!(result.exec_folder, 1)
result = pigeons(new_exec_folder, ChildProcess(n_local_mpi_processes = 2))
Entangler initialized 2 MPI processes; 1 threads per process
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 1.64e+04       7.19       0.47   6.29e+07       -115       0.19      0.201
────────────────────────────────────────────────────────────────────────────

Large immutable data

If part of a target is a large immutable object, it is wasteful to have all the machines write it at each round. To avoid this, encapsulate the large immutable object into an Immutable struct.

For an example where this is used, see here.