Post-processing for MPI runs (plotting, summaries, etc)

Two options are available to post-process samples produced from MPI runs: (1) loading the distributed output back into your interactive shell, or (2) perform post-processing by loading samples from disk one at a time.

Option (1) is more convenient than (2) but it uses more RAM.

Loading the distributed output back into your interactive shell

Many of Pigeons' post-processing tools take as input a PT struct. When running locally, pigeons() returns a PT struct, however, when running a job via MPIProcesses or ChildProcess, pigeons() returns a Result struct (which only holds the directory where samples are stored).

Use Pigeons.load(..) to convert a Result into a PT struct. This will load the information distributed across several machines into the interactive node.

Once you have a PT struct, proceed in the same way as when running PT locally, e.g. see the page on plotting, the page on online statistics, and the page on sample summaries and diagnostics.

For example, here is how to modify the posterior density and trace plot example from the plotting page to run as a local MPI job instead of in-process (the lines differing from the local version are marked with (*)):

using DynamicPPL
using Pigeons
using MCMCChains
using StatsPlots
plotlyjs()

# example target: Binomial likelihood with parameter p = p1 * p2
an_unidentifiable_model = Pigeons.toy_turing_unid_target(100, 50)

pt_result = pigeons(target = an_unidentifiable_model,
                # (*) run in two new MPI processes
                # make sure the MPI processes load DynamicPPL
                on = ChildProcess(n_local_mpi_processes = 2, dependencies=[DynamicPPL]),
                # (*) signal that we want the PT object to be
                #     serialized at the end of each round
                checkpoint = true,
                n_rounds = 12,
                # make sure to record the trace
                # (each machine keeps its own during sampling)
                record = [traces; round_trip; record_default()])

# (*) load the result across all machines into this interactive node
pt = Pigeons.load(pt_result)

# collect the statistics and convert to MCMCChains' Chains
# to have axes labels matching variable names in Turing and Stan
samples = Chains(pt)

# create the trace plots
my_plot = StatsPlots.plot(samples)
StatsPlots.savefig(my_plot, "mpi_posterior_densities_and_traces.html");
Entangler initialized 2 MPI processes; 1 threads per process
─────────────────────────────────────────────────────────────────────────────────────────────────────────────
  scans     restarts      Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)    min(αₑ)   mean(αₑ)
────────── ────────── ────────── ────────── ────────── ────────── ────────── ────────── ────────── ──────────
        2          0       3.24      0.177   7.14e+05      -8.14    0.00178       0.64          1          1
        4          0       1.64      0.478   1.11e+06      -5.04     0.0352      0.818          1          1
        8          0       1.17     0.0464   2.18e+06      -4.42      0.708      0.871          1          1
       16          1        1.2    0.00529   4.48e+06      -4.03      0.549      0.867          1          1
       32          6       1.11    0.00927   8.71e+06      -4.77      0.754      0.877          1          1
       64         11       1.35     0.0235   1.75e+07      -4.79      0.698       0.85          1          1
      128         25        1.6     0.0413   3.43e+07      -4.97      0.725      0.823          1          1
      256         43       1.51     0.0775    6.8e+07      -4.92      0.758      0.832          1          1
      512         92       1.46      0.158   1.37e+08         -5      0.806      0.838          1          1
 1.02e+03        188       1.49       0.29   2.74e+08      -4.92      0.798      0.834          1          1
 2.05e+03        384        1.5      0.584    5.5e+08      -4.96      0.811      0.834          1          1
  4.1e+03        748       1.48       1.18   1.09e+09      -4.94      0.826      0.835          1          1
─────────────────────────────────────────────────────────────────────────────────────────────────────────────

Perform post-processing by loading samples from disk one at a time

Here instead of keeping samples in memory, we instruct the machines to store them on the fly in a shared directory. We do this using the disk recorder.

Then we process the sample one at the time using process_sample().

Here is an example where the target is 1000-dimensional but we are only interested in the first coordinate:

using Pigeons
using Plots

# example target: a 1000 dimensional target
high_d_target = Pigeons.toy_mvn_target(1000)

pt_result = pigeons(target = high_d_target,
                # run in two new MPI processes
                on = ChildProcess(n_local_mpi_processes = 2),
                checkpoint = true,
                # save samples to disk as we go
                record = [disk])

# process the samples one by one, keeping only the first dimension
first_dim_of_each = Vector{Float64}()
process_sample(pt_result) do chain, scan, sample
    # each sample here is a Vector{Float64} of length 1000
    # in general, it will is produced by extract_sample()
    push!(first_dim_of_each, sample[1])
end

plotlyjs()
myplot = Plots.plot(first_dim_of_each)
Plots.savefig(myplot, "first_dim_of_each.html");
Entangler initialized 2 MPI processes; 1 threads per process
──────────────────────────────────────────────────────
  scans        Λ      log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ──────────
        2          9  -1.18e+03  7.04e-107   0.000359
        4       8.97  -1.17e+03  1.31e-102    0.00346
        8       8.38   -1.2e+03  1.13e-107     0.0692
       16       8.94  -1.18e+03   1.65e-93    0.00618
       32       8.96  -1.16e+03   1.45e-70    0.00487
       64       8.85  -1.17e+03   2.13e-83     0.0164
      128       8.88  -1.16e+03   1.23e-68     0.0129
      256       8.92  -1.15e+03   4.23e-64    0.00884
      512       8.92  -1.16e+03   5.16e-68    0.00893
 1.02e+03       8.93  -1.16e+03   6.65e-66    0.00732
──────────────────────────────────────────────────────