Checkpoints

Pigeons can write a "checkpoint" periodically to ensure that not more than half of the work is lost in the event of e.g. a server failure. This is enabled as follows:

using Pigeons
pt = pigeons(target = toy_mvn_target(100), checkpoint = true)
PT("/home/runner/work/Pigeons.jl/Pigeons.jl/docs/build/results/all/2024-12-20-17-51-22-dHgxdb7h")

See write_checkpoint() for details of how this is accomplished in a way compatible to both the single-machine and MPI contexts. Each checkpoint is located in results/all/[unique folder]/round=[x]/checkpoint, with the latest run in results/latest/[unique folder]/round=[x]/checkpoint.

Checkpoints are also useful when an MPI-distributed PT has been ran, and the user wants to load the full set of results in one interactive session.

To load a checkpoint, create a PT struct by passing in the path string to the checkpoint folder, for example to re-load the latest checkpoint from the latest run and perform more sampling:

pt = PT("results/latest")

# do two more rounds of sampling
pt = increment_n_rounds!(pt, 2)
pt = pigeons(pt)
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 2.05e+03       7.19     0.0106   9.66e+04       -115      0.188      0.201
  4.1e+03       7.16     0.0209    7.2e+04       -115      0.184      0.205
────────────────────────────────────────────────────────────────────────────

This is useful when you realize you will need more CPUs or machines to help. Continuing on the above example, we will now do one more round, but this time using 2 local MPI processes:

pt = increment_n_rounds!(pt, 1)
result = pigeons(pt.exec_folder, ChildProcess(n_local_mpi_processes = 2))
Precompiling project...
ExprTools
StructIO
Grisu
Combinatorics
FunctionWrappers
EnzymeCore
CommonSubexpressions
DiffResults
LLVMOpenMP_jll
Imath_jll
MPItrampoline_jll
IntelOpenMP_jll
OpenMPI_jll
Giflib_jll
Enzyme_jll
Arpack_jll
MicrosoftMPI_jll
oneTBB_jll
LLVMExtra_jll
Libgpg_error_jll
libsixel_jll
DiffRules
WebSockets
QOI
NearestNeighbors
JpegTurbo
ImageAxes
EnzymeCore → AdaptExt
ADTypes → ADTypesEnzymeCoreExt
ObjectFile
TimerOutputs
Arpack
MKL_jll
OpenEXR_jll
Libgcrypt_jll
PNGFiles
IntervalArithmetic → IntervalArithmeticDiffRulesExt
Sixel
Clustering
ImageMetadata
OpenEXR
ForwardDiff
XSLT_jll
LogDensityProblemsAD → LogDensityProblemsADForwardDiffExt
ForwardDiff → ForwardDiffStaticArraysExt
IntervalArithmetic → IntervalArithmeticForwardDiffExt
Roots → RootsForwardDiffExt
Netpbm
LLVM
libwebp_jll
Qt6ShaderTools_jll
WebP
Qt6Declarative_jll
Qt6Wayland_jll
ReverseDiff
LogDensityProblemsAD → LogDensityProblemsADReverseDiffExt
GPUCompiler
TiffImages
Enzyme
QuadGK → QuadGKEnzymeExt
Enzyme → EnzymeChainRulesCoreExt
LogDensityProblemsAD → LogDensityProblemsADEnzymeExt
Enzyme → EnzymeLogExpFunctionsExt
Bijectors → BijectorsForwardDiffExt
MultivariateStats
HypothesisTests
OnlineStats
Enzyme → EnzymeSpecialFunctionsExt
Pigeons → PigeonsForwardDiffExt
Pigeons → PigeonsHypothesisTestsExt
StatsPlots
Bijectors → BijectorsEnzymeExt
Bijectors → BijectorsReverseDiffExt
DynamicPPL → DynamicPPLForwardDiffExt
DynamicPPL → DynamicPPLEnzymeCoreExt
Pigeons → PigeonsReverseDiffExt
Enzyme → EnzymeStaticArraysExt
Pigeons → PigeonsEnzymeExt
  78 dependencies successfully precompiled in 248 seconds. 423 already precompiled.
Entangler initialized 2 MPI processes; 1 threads per process
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 8.19e+03       7.18      0.597   3.42e+07       -115      0.192      0.203
────────────────────────────────────────────────────────────────────────────

We conclude with an example showing that it is not necessary to load the PT object into the interactive node in order to increase the number of rounds:

new_exec_folder = increment_n_rounds!(result.exec_folder, 1)
result = pigeons(new_exec_folder, ChildProcess(n_local_mpi_processes = 2))
Entangler initialized 2 MPI processes; 1 threads per process
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 1.64e+04       7.19      0.651   6.68e+07       -115       0.19      0.201
────────────────────────────────────────────────────────────────────────────

Large immutable data

If part of a target is a large immutable object, it is wasteful to have all the machines write it at each round. To avoid this, encapsulate the large immutable object into an Immutable struct.

For an example where this is used, see here.