Checkpoints

Pigeons can write a "checkpoint" periodically to ensure that not more than half of the work is lost in the event of e.g. a server failure. This is enabled as follows:

using Pigeons
pt = pigeons(target = toy_mvn_target(100), checkpoint = true)
PT("/home/runner/work/Pigeons.jl/Pigeons.jl/docs/build/results/all/2024-09-03-17-49-45-wCZj3itY")

See write_checkpoint() for details of how this is accomplished in a way compatible to both the single-machine and MPI contexts. Each checkpoint is located in results/all/[unique folder]/round=[x]/checkpoint, with the latest run in results/latest/[unique folder]/round=[x]/checkpoint.

Checkpoints are also useful when an MPI-distributed PT has been ran, and the user wants to load the full set of results in one interactive session.

To load a checkpoint, create a PT struct by passing in the path string to the checkpoint folder, for example to re-load the latest checkpoint from the latest run and perform more sampling:

pt = PT("results/latest")

# do two more rounds of sampling
pt = increment_n_rounds!(pt, 2)
pt = pigeons(pt)
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 2.05e+03       7.19     0.0106   1.03e+05       -115      0.188      0.201
  4.1e+03       7.16      0.021   7.79e+04       -115      0.184      0.205
────────────────────────────────────────────────────────────────────────────

This is useful when you realize you will need more CPUs or machines to help. Continuing on the above example, we will now do one more round, but this time using 2 local MPI processes:

pt = increment_n_rounds!(pt, 1)
result = pigeons(pt.exec_folder, ChildProcess(n_local_mpi_processes = 2))
Precompiling project...
StructIO
ExprTools
FunctionWrappers
Combinatorics
Grisu
EnzymeCore
CommonSubexpressions
DiffResults
StackViews
PaddedViews
LLVMOpenMP_jll
Imath_jll
IntelOpenMP_jll
MPItrampoline_jll
OpenMPI_jll
Latexify
Enzyme_jll
oneTBB_jll
MicrosoftMPI_jll
LLVMExtra_jll
libsixel_jll
DiffRules
WebSockets
QOI
Qt6ShaderTools_jll
EnzymeCore → AdaptExt
ADTypes → ADTypesEnzymeCoreExt
MosaicViews
ObjectFile
OpenEXR_jll
MKL_jll
TimerOutputs
Latexify → DataFramesExt
Qt6Declarative_jll
ForwardDiff
SIMD
Latexify → SparseArraysExt
OpenEXR
LLVM
Qt6Wayland_jll
ForwardDiff → ForwardDiffStaticArraysExt
IntervalArithmetic → IntervalArithmeticForwardDiffExt
ReverseDiff
IntervalArithmetic → IntervalArithmeticDiffRulesExt
Unitful
Unitful → InverseFunctionsUnitfulExt
ImageCore
Unitful → ConstructionBaseUnitfulExt
Interpolations → InterpolationsUnitfulExt
Accessors → AccessorsUnitfulExt
ImageBase
JpegTurbo
UnitfulLatexify
Sixel
Roots → RootsForwardDiffExt
PNGFiles
ImageAxes
ImageMetadata
Plots → UnitfulExt
Netpbm
GPUCompiler
TiffImages
Enzyme
QuadGK → QuadGKEnzymeExt
LogDensityProblemsAD → LogDensityProblemsADEnzymeExt
Enzyme → EnzymeLogExpFunctionsExt
OnlineStats
HypothesisTests
Enzyme → EnzymeChainRulesCoreExt
Bijectors → BijectorsEnzymeExt
Enzyme → EnzymeSpecialFunctionsExt
Bijectors → BijectorsForwardDiffExt
LogDensityProblemsAD → LogDensityProblemsADReverseDiffExt
LogDensityProblemsAD → LogDensityProblemsADForwardDiffExt
PairPlots → PairPlotsDynamicUnitfulExt
Bijectors → BijectorsReverseDiffExt
DynamicPPL → DynamicPPLEnzymeCoreExt
Pigeons → PigeonsHypothesisTestsExt
Enzyme → EnzymeStaticArraysExt
DynamicPPL → DynamicPPLReverseDiffExt
DynamicPPL → DynamicPPLForwardDiffExt
Pigeons → PigeonsReverseDiffExt
Pigeons → PigeonsForwardDiffExt
Pigeons → PigeonsEnzymeExt
  84 dependencies successfully precompiled in 297 seconds. 393 already precompiled.
Precompiling project...
IntervalArithmetic → IntervalArithmeticForwardDiffExt
Latexify → DataFramesExt
Bijectors → BijectorsEnzymeExt
DynamicPPL → DynamicPPLReverseDiffExt
Pigeons → PigeonsReverseDiffExt
  5 dependencies successfully precompiled in 10 seconds. 472 already precompiled.
Entangler initialized 2 MPI processes; 1 threads per process
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 8.19e+03       7.18      0.386   3.23e+07       -115      0.192      0.203
────────────────────────────────────────────────────────────────────────────

We conclude with an example showing that it is not necessary to load the PT object into the interactive node in order to increase the number of rounds:

new_exec_folder = increment_n_rounds!(result.exec_folder, 1)
result = pigeons(new_exec_folder, ChildProcess(n_local_mpi_processes = 2))
Entangler initialized 2 MPI processes; 1 threads per process
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 1.64e+04       7.19      0.475   6.29e+07       -115       0.19      0.201
────────────────────────────────────────────────────────────────────────────

Large immutable data

If part of a target is a large immutable object, it is wasteful to have all the machines write it at each round. To avoid this, encapsulate the large immutable object into an Immutable struct.

For an example where this is used, see here.