Post-processing for MPI runs (plotting, summaries, etc)
Two options are available to post-process samples produced from MPI runs: (1) loading the distributed output back into your interactive shell, or (2) perform post-processing by loading samples from disk one at a time.
Option (1) is more convenient than (2) but it uses more RAM.
Loading the distributed output back into your interactive shell
Many of Pigeons' post-processing tools take as input a PT
struct. When running locally, pigeons()
returns a PT
struct, however, when running a job via MPIProcesses
or ChildProcess
, pigeons()
returns a Result
struct (which only holds the directory where samples are stored).
Use Pigeons.load(..)
to convert a Result
into a PT
struct. This will load the information distributed across several machines into the interactive node.
Once you have a PT
struct, proceed in the same way as when running PT locally, e.g. see the page on plotting, the page on online statistics, and the page on sample summaries and diagnostics.
For example, here is how to modify the posterior density and trace plot example from the plotting page to run as a local MPI job instead of in-process (the lines differing from the local version are marked with (*)):
using DynamicPPL
using Pigeons
using MCMCChains
using StatsPlots
plotlyjs()
# example target: Binomial likelihood with parameter p = p1 * p2
an_unidentifiable_model = Pigeons.toy_turing_unid_target(100, 50)
pt_result = pigeons(target = an_unidentifiable_model,
# (*) run in two new MPI processes
# make sure the MPI processes load DynamicPPL
on = ChildProcess(n_local_mpi_processes = 2, dependencies=[DynamicPPL]),
# (*) signal that we want the PT object to be
# serialized at the end of each round
checkpoint = true,
n_rounds = 12,
# make sure to record the trace
# (each machine keeps its own during sampling)
record = [traces; round_trip; record_default()])
# (*) load the result across all machines into this interactive node
pt = Pigeons.load(pt_result)
# collect the statistics and convert to MCMCChains' Chains
# to have axes labels matching variable names in Turing and Stan
samples = Chains(pt)
# create the trace plots
my_plot = StatsPlots.plot(samples)
StatsPlots.savefig(my_plot, "mpi_posterior_densities_and_traces.html");
Entangler initialized 2 MPI processes; 1 threads per process
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
scans restarts Λ time(s) allc(B) log(Z₁/Z₀) min(α) mean(α) min(αₑ) mean(αₑ)
────────── ────────── ────────── ────────── ────────── ────────── ────────── ────────── ────────── ──────────
2 0 3.24 0.213 7.14e+05 -8.14 0.00178 0.64 1 1
4 0 1.64 0.516 1.11e+06 -5.04 0.0352 0.818 1 1
8 0 1.17 0.0369 2.18e+06 -4.42 0.708 0.871 1 1
16 1 1.2 0.00525 4.48e+06 -4.03 0.549 0.867 1 1
32 6 1.11 0.016 8.71e+06 -4.77 0.754 0.877 1 1
64 11 1.35 0.0185 1.75e+07 -4.79 0.698 0.85 1 1
128 25 1.6 0.0444 3.43e+07 -4.97 0.725 0.823 1 1
256 43 1.51 0.0764 6.8e+07 -4.92 0.758 0.832 1 1
512 92 1.46 0.15 1.37e+08 -5 0.806 0.838 1 1
1.02e+03 188 1.49 0.322 2.74e+08 -4.92 0.798 0.834 1 1
2.05e+03 384 1.5 0.597 5.5e+08 -4.96 0.811 0.834 1 1
4.1e+03 748 1.48 1.22 1.09e+09 -4.94 0.826 0.835 1 1
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Perform post-processing by loading samples from disk one at a time
Here instead of keeping samples in memory, we instruct the machines to store them on the fly in a shared directory. We do this using the disk
recorder
.
Then we process the sample one at the time using process_sample()
.
Here is an example where the target is 1000-dimensional but we are only interested in the first coordinate:
using Pigeons
using Plots
# example target: a 1000 dimensional target
high_d_target = Pigeons.toy_mvn_target(1000)
pt_result = pigeons(target = high_d_target,
# run in two new MPI processes
on = ChildProcess(n_local_mpi_processes = 2),
checkpoint = true,
# save samples to disk as we go
record = [disk])
# process the samples one by one, keeping only the first dimension
first_dim_of_each = Vector{Float64}()
process_sample(pt_result) do chain, scan, sample
# each sample here is a Vector{Float64} of length 1000
# in general, it will is produced by extract_sample()
push!(first_dim_of_each, sample[1])
end
plotlyjs()
myplot = Plots.plot(first_dim_of_each)
Plots.savefig(myplot, "first_dim_of_each.html");
Entangler initialized 2 MPI processes; 1 threads per process
──────────────────────────────────────────────────────
scans Λ log(Z₁/Z₀) min(α) mean(α)
────────── ────────── ────────── ────────── ──────────
2 9 -1.18e+03 7.04e-107 0.000359
4 8.97 -1.17e+03 1.31e-102 0.00346
8 8.38 -1.2e+03 1.13e-107 0.0692
16 8.94 -1.18e+03 1.65e-93 0.00618
32 8.96 -1.16e+03 1.45e-70 0.00487
64 8.85 -1.17e+03 2.13e-83 0.0164
128 8.88 -1.16e+03 1.23e-68 0.0129
256 8.92 -1.15e+03 4.23e-64 0.00884
512 8.92 -1.16e+03 5.16e-68 0.00893
1.02e+03 8.93 -1.16e+03 6.65e-66 0.00732
──────────────────────────────────────────────────────