Checkpoints

Pigeons can write a "checkpoint" periodically to ensure that not more than half of the work is lost in the event of e.g. a server failure. This is enabled as follows:

using Pigeons
pt = pigeons(target = toy_mvn_target(100), checkpoint = true)
PT("/home/runner/work/Pigeons.jl/Pigeons.jl/docs/build/results/all/2024-02-14-23-37-44-zXfPafrC")

See write_checkpoint() for details of how this is accomplished in a way compatible to both the single-machine and MPI contexts. Each checkpoint is located in results/all/[unique folder]/round=[x]/checkpoint, with the latest run in results/latest/[unique folder]/round=[x]/checkpoint.

Checkpoints are also useful when an MPI-distributed PT has been ran, and the user wants to load the full set of results in one interactive session.

To load a checkpoint, create a PT struct by passing in the path string to the checkpoint folder, for example to re-load the latest checkpoint from the latest run and perform more sampling:

pt = PT("results/latest")

# do two more rounds of sampling
pt = increment_n_rounds!(pt, 2)
pt = pigeons(pt)
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:44
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 2.05e+03       7.19     0.0112   1.03e+05       -115      0.188      0.201
  4.1e+03       7.16      0.022   7.79e+04       -115      0.184      0.205
────────────────────────────────────────────────────────────────────────────

This is useful when you realize you will need more CPUs or machines to help. Continuing on the above example, we will now do one more round, but this time using 2 local MPI processes:

pt = increment_n_rounds!(pt, 1)
result = pigeons(pt.exec_folder, ChildProcess(n_local_mpi_processes = 2))
Precompiling project...
NaturalSort
ScientificTypesBase
Contour
RangeArrays
AbstractFFTs
IterTools
IntervalSets
Distances
Grisu
Formatting
LLVMOpenMP_jll
IntelOpenMP_jll
libevent_jll
MPItrampoline_jll
Arpack_jll
MicrosoftMPI_jll
FFTW_jll
StatisticalTraits
ZygoteRules
TableOperations
WebSockets
IntervalSets → IntervalSetsRandomExt
Distances → DistancesSparseArraysExt
ConstructionBase → ConstructionBaseIntervalSetsExt
MKL_jll
PMIx_jll
AbstractFFTs → AbstractFFTsChainRulesCoreExt
Arpack
IntervalSets → IntervalSetsStatisticsExt
Distances → DistancesChainRulesCoreExt
MLJModelInterface
Accessors → AccessorsIntervalSetsExt
prrte_jll
Latexify
AbstractFFTs → AbstractFFTsTestExt
MultivariateStats
AxisArrays
OpenMPI_jll
DistributionsAD
NearestNeighbors
Latexify → DataFramesExt
MCMCDiagnosticTools
DistributionsAD → DistributionsADForwardDiffExt
Bijectors → BijectorsDistributionsADExt
Clustering
DynamicPPL → DynamicPPLZygoteRulesExt
FFTW
KernelDensity
MCMCChains
DynamicPPL → DynamicPPLMCMCChainsExt
Pigeons → PigeonsMCMCChainsExt
Unitful
Unitful → InverseFunctionsUnitfulExt
Unitful → ConstructionBaseUnitfulExt
UnitfulLatexify
Plots → UnitfulExt
StatsPlots
  57 dependencies successfully precompiled in 53 seconds. 297 already precompiled.
Entangler initialized 2 MPI processes; 1 threads per process
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:44
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 8.19e+03       7.18      0.431   3.23e+07       -115      0.192      0.203
────────────────────────────────────────────────────────────────────────────

We conclude with an example showing that it is not necessary to load the PT object into the interactive node in order to increase the number of rounds:

new_exec_folder = increment_n_rounds!(result.exec_folder, 1)
result = pigeons(new_exec_folder, ChildProcess(n_local_mpi_processes = 2))
Entangler initialized 2 MPI processes; 1 threads per process
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:44
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 1.64e+04       7.19      0.488   6.29e+07       -115       0.19      0.201
────────────────────────────────────────────────────────────────────────────

Large immutable data

If part of a target is a large immutable object, it is wasteful to have all the machines write it at each round. To avoid this, encapsulate the large immutable object into an Immutable struct.

For an example where this is used, see here.