Checkpoints

Pigeons can write a "checkpoint" periodically to ensure that not more than half of the work is lost in the event of e.g. a server failure. This is enabled as follows:

using Pigeons
pt = pigeons(target = toy_mvn_target(100), checkpoint = true)
PT("/home/runner/work/Pigeons.jl/Pigeons.jl/docs/build/results/all/2024-12-11-01-46-51-7hpJI4U0")

See write_checkpoint() for details of how this is accomplished in a way compatible to both the single-machine and MPI contexts. Each checkpoint is located in results/all/[unique folder]/round=[x]/checkpoint, with the latest run in results/latest/[unique folder]/round=[x]/checkpoint.

Checkpoints are also useful when an MPI-distributed PT has been ran, and the user wants to load the full set of results in one interactive session.

To load a checkpoint, create a PT struct by passing in the path string to the checkpoint folder, for example to re-load the latest checkpoint from the latest run and perform more sampling:

pt = PT("results/latest")

# do two more rounds of sampling
pt = increment_n_rounds!(pt, 2)
pt = pigeons(pt)
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 2.05e+03       7.19     0.0108   9.66e+04       -115      0.188      0.201
  4.1e+03       7.16     0.0212    7.2e+04       -115      0.184      0.205
────────────────────────────────────────────────────────────────────────────

This is useful when you realize you will need more CPUs or machines to help. Continuing on the above example, we will now do one more round, but this time using 2 local MPI processes:

pt = increment_n_rounds!(pt, 1)
result = pigeons(pt.exec_folder, ChildProcess(n_local_mpi_processes = 2))
Precompiling project...
ExprTools
StructIO
Grisu
FunctionWrappers
Combinatorics
EnzymeCore
CommonSubexpressions
DiffResults
LLVMOpenMP_jll
Imath_jll
MPItrampoline_jll
IntelOpenMP_jll
OpenMPI_jll
Giflib_jll
Enzyme_jll
oneTBB_jll
Arpack_jll
LLVMExtra_jll
MicrosoftMPI_jll
Libgpg_error_jll
libsixel_jll
DiffRules
WebSockets
QOI
NearestNeighbors
JpegTurbo
ImageAxes
EnzymeCore → AdaptExt
ObjectFile
ADTypes → ADTypesEnzymeCoreExt
TimerOutputs
OpenEXR_jll
MKL_jll
PNGFiles
Arpack
Libgcrypt_jll
IntervalArithmetic → IntervalArithmeticDiffRulesExt
Sixel
Clustering
OpenEXR
ImageMetadata
XSLT_jll
ForwardDiff
libwebp_jll
Qt6ShaderTools_jll
LogDensityProblemsAD → LogDensityProblemsADForwardDiffExt
ForwardDiff → ForwardDiffStaticArraysExt
Netpbm
IntervalArithmetic → IntervalArithmeticForwardDiffExt
LLVM
Roots → RootsForwardDiffExt
WebP
Qt6Declarative_jll
Qt6Wayland_jll
ReverseDiff
LogDensityProblemsAD → LogDensityProblemsADReverseDiffExt
GPUCompiler
TiffImages
Enzyme
QuadGK → QuadGKEnzymeExt
Enzyme → EnzymeChainRulesCoreExt
LogDensityProblemsAD → LogDensityProblemsADEnzymeExt
Enzyme → EnzymeLogExpFunctionsExt
Bijectors → BijectorsForwardDiffExt
MultivariateStats
HypothesisTests
OnlineStats
Bijectors → BijectorsEnzymeCoreExt
Enzyme → EnzymeSpecialFunctionsExt
Pigeons → PigeonsForwardDiffExt
Pigeons → PigeonsHypothesisTestsExt
StatsPlots
Bijectors → BijectorsReverseDiffExt
DynamicPPL → DynamicPPLEnzymeCoreExt
DynamicPPL → DynamicPPLForwardDiffExt
Pigeons → PigeonsReverseDiffExt
Enzyme → EnzymeStaticArraysExt
Pigeons → PigeonsEnzymeExt
  78 dependencies successfully precompiled in 275 seconds. 419 already precompiled.
Entangler initialized 2 MPI processes; 1 threads per process
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 8.19e+03       7.18      0.538   3.42e+07       -115      0.192      0.203
────────────────────────────────────────────────────────────────────────────

We conclude with an example showing that it is not necessary to load the PT object into the interactive node in order to increase the number of rounds:

new_exec_folder = increment_n_rounds!(result.exec_folder, 1)
result = pigeons(new_exec_folder, ChildProcess(n_local_mpi_processes = 2))
Entangler initialized 2 MPI processes; 1 threads per process
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 1.64e+04       7.19      0.604   6.68e+07       -115       0.19      0.201
────────────────────────────────────────────────────────────────────────────

Large immutable data

If part of a target is a large immutable object, it is wasteful to have all the machines write it at each round. To avoid this, encapsulate the large immutable object into an Immutable struct.

For an example where this is used, see here.