Targeting a non-Julian model

Suppose you have some code implementing vanilla MCMC, written in an arbitrary "foreign" language such as C++, Python, R, Java, etc. You would like to turn this vanilla MCMC code into a Parallel Tempering algorithm able to harness large numbers of cores, including distributing this algorithm over MPI. However, you do not wish to learn anything about MPI/multi-threading/Parallel Tempering.

Surprisingly, it is very simple to bridge such code with Pigeons. The only requirement on the "foreign" language is that it supports reading the standard in and writing to the standard out, hence virtually any languages can be interfaced in this fashion. Based on this minimalist "standard stream bridge" with worker processes running foreign code (one such process per replica; not necessarily running on the same machine), Pigeons will coordinate the execution of an adaptive non-reversible parallel tempering algorithm.

This behaviour is implemented in StreamTarget, see its documentation for details. In a nutshell, there will be one child process for each PT chain. These processes will not necessarily be on the same machine: indeed distributed sampling is the key use case of this bridge. Pigeons will do some lightweight coordination with these child processes to orchestrate non-reversible parallel tempering. Interprocess communication only involves pigeons telling each child process to perform exploration at a pigeons-provided annealing parameter.

StreamTarget implements log_potential and explorer by invoking worker processes via standard stream communication. The standard stream is less efficient than alternatives such as protobuff, but it has the advantage of being supported by nearly all programming languages in existence. Also in many practical cases, since the worker process is invoked only three times per chain per iteration, it is unlikely to be the bottleneck (overhead is in the order of 0.1ms per interprocess call).

Usage example

To demonstrate this capability, we show here how it enables running Blang models in pigeons. Blang is a Bayesian modelling language designed for sampling combinatorial spaces such as phylogenetic trees.

We first setup Blang as follows (assuming Java 11 is accessible in the PATH variable):

using Pigeons

redirect_stdout(devnull) do
    Pigeons.setup_blang("blangDemos")
end
┌ Warning: Using a precompiled build for blangDemos: double check it is up to date
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/targets/BlangTarget.jl:141
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
 48  152M   48 74.1M    0     0  62.1M      0  0:00:02  0:00:01  0:00:01 62.1M
 69  152M   69  106M    0     0  48.2M      0  0:00:03  0:00:02  0:00:01 48.2M
 79  152M   79  121M    0     0  38.0M      0  0:00:04  0:00:03  0:00:01 38.0M
 84  152M   84  129M    0     0  30.9M      0  0:00:04  0:00:04 --:--:-- 30.9M
 90  152M   90  138M    0     0  26.5M      0  0:00:05  0:00:05 --:--:-- 27.6M
 96  152M   96  147M    0     0  23.8M      0  0:00:06  0:00:06 --:--:-- 14.6M
100  152M  100  152M    0     0  22.8M      0  0:00:06  0:00:06 --:--:-- 10.2M

Next, we run a Blang implementation of our usual unidentifiable toy example:

using Pigeons

blang_unidentifiable_example(n_trials, n_successes) =
    Pigeons.BlangTarget(
        `$(Pigeons.blang_executable("blangDemos", "demos.UnidentifiableProduct")) --model.nTrials $n_trials --model.nFails $n_successes`
    )
pt = pigeons(target = blang_unidentifiable_example(100, 50))
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
Preprocess {
  2 samplers constructed with following prototypes:
    RealScalar sampled via: [RealSliceSampler]
} [ endingBlock=Preprocess blockTime=357.9ms blockNErrors=0 ]
Inference {
  ────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
        2      0.992       4.42   1.47e+07      -4.05      0.621       0.89
        4       1.42     0.0414   9.13e+06      -4.43      0.192      0.842
        8       1.44     0.0645   1.82e+07      -5.35      0.581       0.84
       16        1.7      0.115   3.49e+07      -4.72      0.577      0.811
       32       1.51      0.224   6.99e+07       -4.8      0.602      0.833
       64       1.52      0.406   1.42e+08      -4.95      0.727      0.831
      128       1.56      0.521   2.79e+08      -4.93      0.713      0.827
      256       1.52      0.883   5.46e+08      -5.06       0.78      0.831
      512       1.52       1.51   1.07e+09      -4.97      0.793      0.831
 1.02e+03       1.54        2.8   2.11e+09      -4.97      0.789      0.829
────────────────────────────────────────────────────────────────────────────

As shown above, create a StreamTarget amounts to specifying which command will be used to create a child process.

To terminate the child processes associated with a stream target, use:

Pigeons.kill_child_processes(pt)