The Thornton Lab

#demes   #python   #rust   #fwdpy11   #simulation  

Introducing 'demes'


Demes: a standard format for demographic models

This post discusses a recent paper resulting from a wonderful collaboration where we attempt to define a standard way to describe a specific family of models that are widely used in population genetics.

Background

Population genetics research heavily depends on software for two tasks:

These two tasks are often tightly coupled — we will simulate data to test our inference methods, etc..

The field’s software ecosystem is highly fragmented. Conservatively, dozens of tools exists for both tasks. To some extent, this is reasonable. The problems are hard and academic research tends to slowly grind away on them bit by bit over the years.

One challenge of this fragmentation is that most of the tools cannot interoperate. Consider the following work flow:

All of the tools just mentioned are used by research currently in progress. All of them have different ways of saying, “we want to do something with a model of an ancestral population that split into two populations in the recent past”. We have a mix of command line tools, tools with Python APIs, tools requiring input files in a bespoke format, etc..

This has been the situation for the past 20+ years. The lack of interoperability has real consequences. A lot of time can be lost (wasted) dealing with the complexities of changing the model description to use with different tools. The process is error-prone and has led to errors.

demes is an effort to do better.

Example model

Consider the following scenario:

Instead of learning the ms command line or the msprime demographic events API, let’s just write our model down in YAML, a standard format for data interchange:

description: population split with bottlenecks and growth
time_units: generations
demes:
  - name: ancestor
    epochs:
      - start_size: 1000
        end_time: 100
  - name: derived1
    ancestors: [ancestor]
    epochs:
      - start_size: 250
        end_size: 1500
  - name: derived2
    ancestors: [ancestor]
    epochs:
      - start_size: 500
        end_size: 800
migrations:
  - demes: [derived1, derived2]
    rate: 1e-5
    start_time: 50

The advantages of using YAML include:

The following graphic illustrates the model. Time on the y axis starts 0 generations ago, indicating the present day.

A split model with growth and migration

The code to generate this graphic is:

import matplotlib.pyplot as plt

import demes
import demesdraw

graph = demes.load("demes_post_model.yaml")

tubes = demesdraw.tubes(graph)

plt.savefig("demes_post_model.png")

Specification and software ecosystem

The complete specification of the demes format is here. The paper discusses the reasons behind several of the decisions in the spec. The paper also lists what tools currently use demes as input and/or output formats. Graham Gower has written an excellent tutorial.

demes has good language support:

Integration with fwdpy11

fwdpy11 is developed by our group. We can define demographic models using demes quite easily:

# Read the model in from the YAML.
# The 1 tells fwdpy11 to generate
# a model where the ancestral population is
# evolved for N generations, where N is its initial size
model = fwdpy11.discrete_demography.from_demes("demes_post_model.yaml", 1)

How the sausage used to be made

Prior to demes, the following Python code was needed to define this same model in fwdpy11:

model=fwdpy11.DiscreteDemography(
    mass_migrations=[],
    set_growth_rates=[
        fwdpy11.SetExponentialGrowth(when=1000, deme=1, G=1.0180790778133073),
        fwdpy11.SetExponentialGrowth(when=1000, deme=2, G=1.0047110987876184),
    ],
    set_deme_sizes=[
        fwdpy11.SetDemeSize(when=1000, deme=0, new_size=0, resets_growth_rate=True),
        fwdpy11.SetDemeSize(
            when=1000, deme=1, new_size=250, resets_growth_rate=True
        ),
        fwdpy11.SetDemeSize(
            when=1000, deme=2, new_size=500, resets_growth_rate=True
        ),
    ],
    set_selfing_rates=[
        fwdpy11.SetSelfingRate(when=0, deme=0, S=0),
        fwdpy11.SetSelfingRate(when=1000, deme=1, S=0),
        fwdpy11.SetSelfingRate(when=1000, deme=2, S=0),
    ],
    migmatrix=fwdpy11.MigrationMatrix(
        migmatrix=array([[1.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]),
        scaled=False,
    ),
    set_migration_rates=[
        fwdpy11.SetMigrationRates(
            when=1000, deme=0, migrates=array([0.0, 0.0, 0.0])
        ),
        fwdpy11.SetMigrationRates(
            when=1000, deme=1, migrates=array([1.0, 0.0, 0.0])
        ),
        fwdpy11.SetMigrationRates(
            when=1000, deme=2, migrates=array([1.0, 0.0, 0.0])
        ),
        fwdpy11.SetMigrationRates(
            when=1001, deme=1, migrates=array([0.0, 1.0, 0.0])
        ),
        fwdpy11.SetMigrationRates(
            when=1001, deme=2, migrates=array([0.0, 0.0, 1.0])
        ),
        fwdpy11.SetMigrationRates(
            when=1050, deme=1, migrates=array([0.0000e00, 9.9999e-01, 1.0000e-05])
        ),
        fwdpy11.SetMigrationRates(
            when=1050, deme=2, migrates=array([0.0000e00, 1.0000e-05, 9.9999e-01])
        ),
    ],
)

It should be obvious which method to use…

Future directions

fwdpy11 is moving towards a situation where demes graphs are the core object for evolving models of discrete populations. As of version 0.19.0, the rust implementation of demes is compiled into the core library. (That is a neat story on its own, as the back-end is now a C++ dynamic library that is partially written in rust.)

The types found in the very long code listing above will be deprecated in favor of directly importing either YAML files or demes.Graph objects from Python. I am guesstimating that the changes will result in a few thousand lines of C++ and Python being deleted.

Code

Here is the code for the above examples.