Saving traces

The traces refer to the list of samples $X_1, X_2, \dots, X_n$ from which we can approximate expectations of the form $E[f(X)]$, where $X \sim \pi$ via a Monte Carlo average of the form $\sum_i f(X_i) / n$.

To indicate that the traces should be saved, use

using DynamicPPL
using Pigeons

target = Pigeons.toy_turing_unid_target(100, 50)

pt = pigeons(;  target,
                n_rounds = 3,
                # make sure to record the trace:
                record = [traces; round_trip; record_default()])
PT(checkpoint = false, ...)

Note that there are more memory efficient alternatives to saving the full traces: see online (constant-memory) statistics and off-memory processing.

Accessing traces

Use get_sample to access the list of samples:

get_sample(pt)
8-element Pigeons.SampleArray{Vector{Float64}, PT}:
 [0.5480531395881381, 0.7111587788265167, -8.001156932600198]
 [0.6417578593639657, 0.8193098317534669, -6.044521492146245]
 [0.5635640490721443, 0.9512261759202737, -7.265017408328339]
 [0.5372315596959659, 0.9417959416828765, -6.833608590115077]
 [0.6424119190840571, 0.6614625444898359, -6.638153033637418]
 [0.7108478937373857, 0.7534993485537478, -6.0508454283065785]
 [0.6351986896221519, 0.7398042493665612, -5.8220440272267]
 [0.8062385791654489, 0.6066887061763869, -5.843884557029044]

In the special case where each state is a vector, use sample_names to obtain description of the vector components:

sample_names(pt)
3-element Vector{Symbol}:
 :p1
 :p2
 :log_density

Still in the special case where each state is a vector, it is often convenient to organize the result into a single array, this is done using sample_array:

sample_array(pt)
8×3×1 Array{Float64, 3}:
[:, :, 1] =
 0.548053  0.711159  -8.00116
 0.641758  0.81931   -6.04452
 0.563564  0.951226  -7.26502
 0.537232  0.941796  -6.83361
 0.642412  0.661463  -6.63815
 0.710848  0.753499  -6.05085
 0.635199  0.739804  -5.82204
 0.806239  0.606689  -5.84388

Customizing what is saved in the traces

You may want to save only some statistics of interest, or a subset of the dimensions to take up less memory.

We show here an example saving only the value of the first coordinate:

struct OnlyFirstExtractor end

Pigeons.extract_sample(state, log_potential, extractor::OnlyFirstExtractor) =
    Pigeons.extract_sample(state, log_potential)[1:1]


pt = pigeons(;  target,
                n_rounds = 3,
                # custom method to extract samples:
                extractor = OnlyFirstExtractor(),
                # make sure to record the trace:
                record = [traces; round_trip; record_default()])

sample_array(pt)
8×1×1 Array{Float64, 3}:
[:, :, 1] =
 0.5480531395881381
 0.6417578593639657
 0.5635640490721443
 0.5372315596959659
 0.6424119190840571
 0.7108478937373857
 0.6351986896221519
 0.8062385791654489

Optionally, it is a good idea to also adjust the behaviour of sample_names accordingly. For example, variables_names gets called when creating MCMCChains object so that e.g. plots are labelled correctly.

Pigeons.sample_names(state, log_potential, extractor::OnlyFirstExtractor) =
    Pigeons.sample_names(state, log_potential)[1:1]

To keep only the value of the log potential, you can use the following built-in LogPotentialExtractor:

pt = pigeons(;  target,
                n_rounds = 3,
                # custom method to extract samples:
                extractor = Pigeons.LogPotentialExtractor(),
                # make sure to record the trace:
                record = [traces; round_trip; record_default()])

sample_array(pt)
8×1×1 Array{Float64, 3}:
[:, :, 1] =
 -8.001156932600198
 -6.044521492146245
 -7.265017408328354
 -6.8336085901150785
 -6.638153033637418
 -6.0508454283065785
 -5.8220440272267
 -5.843884557029044