Targeting a non-Julian model

Suppose you have some code implementing vanilla MCMC, written in an arbitrary "foreign" language such as C++, Python, R, Java, etc. You would like to turn this vanilla MCMC code into a Parallel Tempering algorithm able to harness large numbers of cores, including distributing this algorithm over MPI. However, you do not wish to learn anything about MPI/multi-threading/Parallel Tempering.

Surprisingly, it is very simple to bridge such code with Pigeons. The only requirement on the "foreign" language is that it supports reading the standard in and writing to the standard out, hence virtually any languages can be interfaced in this fashion. Based on this minimalist "standard stream bridge" with worker processes running foreign code (one such process per replica; not necessarily running on the same machine), Pigeons will coordinate the execution of an adaptive non-reversible parallel tempering algorithm.

This behaviour is implemented in StreamTarget, see its documentation for details. In a nutshell, there will be one child process for each PT chain. These processes will not necessarily be on the same machine: indeed distributed sampling is the key use case of this bridge. Pigeons will do some lightweight coordination with these child processes to orchestrate non-reversible parallel tempering. Interprocess communication only involves pigeons telling each child process to perform exploration at a pigeons-provided annealing parameter.

StreamTarget implements log_potential and explorer by invoking worker processes via standard stream communication. The standard stream is less efficient than alternatives such as protobuff, but it has the advantage of being supported by nearly all programming languages in existence. Also in many practical cases, since the worker process is invoked only three times per chain per iteration, it is unlikely to be the bottleneck (overhead is in the order of 0.1ms per interprocess call).

Usage example

To demonstrate this capability, we show here how it enables running Blang models in pigeons. Blang is a Bayesian modelling language designed for sampling combinatorial spaces such as phylogenetic trees.

We first setup Blang as follows (assuming Java 11 is accessible in the PATH variable):

using Pigeons

Pigeons.setup_blang("blangDemos")
Cloning into 'blangDemos'...
Downloading https://services.gradle.org/distributions/gradle-5.2.1-bin.zip
...................................................................................

Welcome to Gradle 5.2.1!

Here are the highlights of this release:
 - Define sets of dependencies that work together with Java Platform plugin
 - New C++ plugins with dependency management built-in
 - New C++ project types for gradle init
 - Service injection into plugins and project extensions

For more details see https://docs.gradle.org/5.2.1/release-notes.html

Starting a Gradle Daemon (subsequent builds will be faster)

> Task :generateXtext
WARNING:The use of wildcard imports is deprecated. (file:/home/runner/.pigeons/blangDemos/src/main/java/demos/PhylogeneticTree.bl line : 2 column : 1)

> Task :compileJava
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :processResources
> Task :classes
> Task :jar
> Task :startScripts
> Task :installDist

BUILD SUCCESSFUL in 51s
6 actionable tasks: 6 executed

Next, we run a Blang implementation of our usual unidentifiable toy example:

using Pigeons

blang_unidentifiable_example(n_trials, n_successes) =
    Pigeons.BlangTarget(
        `$(Pigeons.blang_executable("blangDemos", "demos.UnidentifiableProduct")) --model.nTrials $n_trials --model.nFails $n_successes`
    )
pt = pigeons(target = blang_unidentifiable_example(100, 50))
┌ Info: Neither traces, disk, nor online recorders included.
   You may not have access to your samples (unless you are using a custom recorder, or maybe you just want log(Z)).
   To add recorders, use e.g. pigeons(target = ..., record = [traces; record_default()])
Preprocess {
  2 samplers constructed with following prototypes:
    RealScalar sampled via: [RealSliceSampler]
} [ endingBlock=Preprocess blockTime=492.8ms blockNErrors=0 ]
Inference {
  ────────────────────────────────────────────────────────────────────────────
  scans        Λ        time(s)    allc(B)  log(Z₁/Z₀)   min(α)     mean(α)
────────── ────────── ────────── ────────── ────────── ────────── ──────────
        2      0.992      0.427   1.62e+07      -4.05      0.621       0.89
        4       1.42      0.046   6.37e+05      -4.43      0.192      0.842
        8       1.44      0.071   7.21e+05      -5.35      0.581       0.84
       16        1.7      0.101   1.42e+06      -4.72      0.577      0.811
       32       1.51      0.188    2.8e+06       -4.8      0.602      0.833
       64       1.52      0.345   5.52e+06      -4.95      0.727      0.831
      128       1.56      0.487   1.09e+07      -4.93      0.713      0.827
      256       1.52      0.892   2.18e+07      -5.06       0.78      0.831
      512       1.52       1.44   4.34e+07      -4.97      0.793      0.831
 1.02e+03       1.54       2.58   8.66e+07      -4.97      0.789      0.829
────────────────────────────────────────────────────────────────────────────

As shown above, create a StreamTarget amounts to specifying which command will be used to create a child process.

To terminate the child processes associated with a stream target, use:

Pigeons.kill_child_processes(pt)