Targeting a non-Julian model

Suppose you have some code implementing vanilla MCMC, written in an arbitrary "foreign" language such as C++, Python, R, Java, etc. You would like to turn this vanilla MCMC code into a Parallel Tempering algorithm able to harness large numbers of cores, including distributing this algorithm over MPI. However, you do not wish to learn anything about MPI/multi-threading/Parallel Tempering.

Surprisingly, it is very simple to bridge such code with Pigeons. The only requirement on the "foreign" language is that it supports reading the standard in and writing to the standard out, hence virtually any languages can be interfaced in this fashion. Based on this minimalist "standard stream bridge" with worker processes running foreign code (one such process per replica; not necessarily running on the same machine), Pigeons will coordinate the execution of an adaptive non-reversible parallel tempering algorithm.

This behaviour is implemented in StreamTarget, see its documentation for details. In a nutshell, there will be one child process for each PT chain. These processes will not necessarily be on the same machine: indeed distributed sampling is the key use case of this bridge. Pigeons will do some lightweight coordination with these child processes to orchestrate non-reversible parallel tempering. Interprocess communication only involves pigeons telling each child process to perform exploration at a pigeons-provided annealing parameter.

StreamTarget implements log_potential and explorer by invoking worker processes via standard stream communication. The standard stream is less efficient than alternatives such as protobuff, but it has the advantage of being supported by nearly all programming languages in existence. Also in many practical cases, since the worker process is invoked only three times per chain per iteration, it is unlikely to be the bottleneck (overhead is in the order of 0.1ms per interprocess call).

Usage example

To demonstrate this capability, we show here how it enables running Blang models in pigeons. Blang is a Bayesian modelling language designed for sampling combinatorial spaces such as phylogenetic trees.

We first setup Blang as follows (assuming Java 11 is accessible in the PATH variable):

using Pigeons

redirect_stdout(devnull) do
    Pigeons.setup_blang("blangDemos")
end
┌ Warning: Using a precompiled build for blangDemos: double check it is up to date
@ Pigeons ~/work/Pigeons.jl/Pigeons.jl/src/targets/BlangTarget.jl:141
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0
  0  152M    0  609k    0     0   124k      0  0:20:53  0:00:04  0:20:49  124k
  2  152M    2 3538k    0     0   603k      0  0:04:19  0:00:05  0:04:14  729k
  6  152M    6 9953k    0     0  1449k      0  0:01:48  0:00:06  0:01:42 2048k
 12  152M   12 19.7M    0     0  2566k      0  0:01:01  0:00:07  0:00:54 4150k
 19  152M   19 29.1M    0     0  3362k      0  0:00:46  0:00:08  0:00:38 6140k
 24  152M   24 37.0M    0     0  3841k      0  0:00:40  0:00:09  0:00:31 7464k
 28  152M   28 44.3M    0     0  4179k      0  0:00:37  0:00:10  0:00:27 8366k
 33  152M   33 50.8M    0     0  4378k      0  0:00:35  0:00:11  0:00:24 8386k
 37  152M   37 57.3M    0     0  4564k      0  0:00:34  0:00:12  0:00:22 7732k
 42  152M   42 64.6M    0     0  4767k      0  0:00:32  0:00:13  0:00:19 7258k
 46  152M   46 71.5M    0     0  4929k      0  0:00:31  0:00:14  0:00:17 7087k
 51  152M   51 78.5M    0     0  5062k      0  0:00:30  0:00:15  0:00:15 6978k
 55  152M   55 85.5M    0     0  5184k      0  0:00:30  0:00:16  0:00:14 7100k
 60  152M   60 92.4M    0     0  5299k      0  0:00:29  0:00:17  0:00:12 7189k
 65  152M   65 99.9M    0     0  5411k      0  0:00:28  0:00:18  0:00:10 7187k
 70  152M   70  107M    0     0  5517k      0  0:00:28  0:00:19  0:00:09 7252k
 74  152M   74  113M    0     0  5556k      0  0:00:28  0:00:20  0:00:08 7123k
 78  152M   78  119M    0     0  5601k      0  0:00:27  0:00:21  0:00:06 7015k
 82  152M   82  126M    0     0  5664k      0  0:00:27  0:00:22  0:00:05 6964k
 87  152M   87  133M    0     0  5736k      0  0:00:27  0:00:23  0:00:04 6978k
 92  152M   92  141M    0     0  5826k      0  0:00:26  0:00:24  0:00:02 7060k
 97  152M   97  149M    0     0  5900k      0  0:00:26  0:00:25  0:00:01 7340k
100  152M  100  152M    0     0  5937k      0  0:00:26  0:00:26 --:--:-- 7563k

Next, we run a Blang implementation of our usual unidentifiable toy example:

using Pigeons

blang_unidentifiable_example(n_trials, n_successes) =
    Pigeons.BlangTarget(
        `$(Pigeons.blang_executable("blangDemos", "demos.UnidentifiableProduct")) --model.nTrials $n_trials --model.nFails $n_successes`
    )
pt = pigeons(target = blang_unidentifiable_example(100, 50))
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
Preprocess {
  2 samplers constructed with following prototypes:
    RealScalar sampled via: [RealSliceSampler]
} [ endingBlock=Preprocess blockTime=409.1ms blockNErrors=0 ]
Inference {
  ────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
        2      0.992       5.85   1.61e+07      -4.05      0.621       0.89
        4       1.42     0.0375   9.33e+06      -4.43      0.192      0.842
        8       1.44      0.113   1.77e+07      -5.35      0.581       0.84
       16        1.7      0.135   3.63e+07      -4.72      0.577      0.811
       32       1.51      0.231   6.76e+07       -4.8      0.602      0.833
       64       1.52      0.407   1.34e+08      -4.95      0.727      0.831
      128       1.56      0.595   2.67e+08      -4.93      0.713      0.827
      256       1.52       1.11   5.35e+08      -5.06       0.78      0.831
      512       1.52       1.69   1.05e+09      -4.97      0.793      0.831
 1.02e+03       1.54       3.04    2.1e+09      -4.97      0.789      0.829
────────────────────────────────────────────────────────────────────────────

As shown above, create a StreamTarget amounts to specifying which command will be used to create a child process.

To terminate the child processes associated with a stream target, use:

Pigeons.kill_child_processes(pt)