Checkpoints

Pigeons can write a "checkpoint" periodically to ensure that not more than half of the work is lost in the event of e.g. a server failure. This is enabled as follows:

using Pigeons
pt = pigeons(target = toy_mvn_target(100), checkpoint = true)
PT("/home/runner/work/Pigeons.jl/Pigeons.jl/docs/build/results/all/2024-11-14-21-19-49-sPBdPEIb")

See write_checkpoint() for details of how this is accomplished in a way compatible to both the single-machine and MPI contexts. Each checkpoint is located in results/all/[unique folder]/round=[x]/checkpoint, with the latest run in results/latest/[unique folder]/round=[x]/checkpoint.

Checkpoints are also useful when an MPI-distributed PT has been ran, and the user wants to load the full set of results in one interactive session.

To load a checkpoint, create a PT struct by passing in the path string to the checkpoint folder, for example to re-load the latest checkpoint from the latest run and perform more sampling:

pt = PT("results/latest")

# do two more rounds of sampling
pt = increment_n_rounds!(pt, 2)
pt = pigeons(pt)
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 2.05e+03       7.19     0.0107   1.03e+05       -115      0.188      0.201
  4.1e+03       7.16     0.0213   7.79e+04       -115      0.184      0.205
────────────────────────────────────────────────────────────────────────────

This is useful when you realize you will need more CPUs or machines to help. Continuing on the above example, we will now do one more round, but this time using 2 local MPI processes:

pt = increment_n_rounds!(pt, 1)
result = pigeons(pt.exec_folder, ChildProcess(n_local_mpi_processes = 2))
Precompiling project...
Pigeons → PigeonsForwardDiffExt
Pigeons → PigeonsHypothesisTestsExt
Pigeons → PigeonsReverseDiffExt
Pigeons → PigeonsEnzymeExt
  4 dependencies successfully precompiled in 18 seconds. 497 already precompiled.
Entangler initialized 2 MPI processes; 1 threads per process
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 8.19e+03       7.18      0.713   3.42e+07       -115      0.192      0.203
────────────────────────────────────────────────────────────────────────────

We conclude with an example showing that it is not necessary to load the PT object into the interactive node in order to increase the number of rounds:

new_exec_folder = increment_n_rounds!(result.exec_folder, 1)
result = pigeons(new_exec_folder, ChildProcess(n_local_mpi_processes = 2))
Entangler initialized 2 MPI processes; 1 threads per process
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
┌ Warning: The set of successful reports changed
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/pt/report.jl:46
────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
 1.64e+04       7.19       0.85   6.68e+07       -115       0.19      0.201
────────────────────────────────────────────────────────────────────────────

Large immutable data

If part of a target is a large immutable object, it is wasteful to have all the machines write it at each round. To avoid this, encapsulate the large immutable object into an Immutable struct.

For an example where this is used, see here.