Data Warehouse


The purpose of Magma is to define a data graph for each Etna project: a set of models for each of the entities in the project dataset.

Each model has a set of attributes, which may broadly be divided into value types, which hold, e.g., an integer or a reference to a binary file, and link types, e.g. parent or collection, which define relationships between the models in Magma.

Attached to each model is a collection of records, containing a data value for each attribute, including links to other records and files stored on metis.


The shape of the data graph (and thus the relationships defined in each of the models) has some constraint:

Here is a sketch of what the graph for the “olympics” project might look like:

project {
  identifier project_name
  collection event
  collection athlete

event {
  parent project
  identifier event_name
  table entry

entry {
  parent event
  link athlete
  integer placement
  integer score

athlete {
  parent project
  identifier name
  collection entry

Here all links are reciprocal, and every model descends from the project. However, using the link attribute we may indicate other one-to-one or one-to-many relationships, which allows the graph to be more like a directed acyclic graph (DAG) than a tree.

Models may also specify a dictionary model and mapping to be used for more complex validations (see below on dictionary validation).


Each attribute has, at least, a unique attribute name within its model, and a distinct attribute type.

The Magma attribute types are:

Value types

Outgoing link types

Incoming link types

Other types

In addition to its type, each attribute may set several other fields:


A record is a set of values for each attribute in the model. The set of records for a project form a data graph with a single project record at the root.


Magma models may define validations, which helps ensure the integrity of data as it enters Magma (invalid data is rejected). Magma has two basic forms of validation. The first is attribute validation, which adds matchers to each attribute on a model, e.g.:

class MyModel < Magma::Model
  attribute :att1, type: String, match: /[r]egexp/
  attribute :att2, type: String, match: [ 'list', 'of', 'options' ]


An alternative method of validation is via a Magma::Dictionary, which allows a model to be validated using data (records) from another model.

You may define a dictionary relation as follows:

class MyModel < Magma::Model
  attribute :att1
  attribute :att2

  dictionary DictModel, att1: :dict_att1, att2: :dict_att2

A my_model record is valid if there is a matching entry in the dictionary model, i.e., where my_record.att1 matches dict_record.att1 and my_record.att2 matches dict_record.att2. Here ‘match’ might mean ‘equality’, but a dictionary may also include a ‘match’ attribute.

class DictModel < Magma::Model
  attribute :dict_att1
  match :dict_att2

A match attribute contains json data like {type,value}. This allows us to construct more complex entries:

# match any value in this Array
{ type: 'Array', value: [ 'x', 'y', 'z' ] }
# match within this Range
{ type: 'Range', value: [ 0, 100 ] }
# match this Regexp
{ type: 'Regexp', value: '^something$' }
# match an ordinary value
{ type: 'String', value: 'something' }



The main way to interact with Magma is via the API. In the simplest case this can be done using curl or wget to POST if one sets the Authorization: Etna <your token> header; but any HTTP client will suffice. Visit your Janus instance to get a current token, or make use of other ways to authenticate with Janus.

To use Etna::Client to connect to Magma in Ruby, you may gem install etna and then create a new client:

require 'etna'

e ='', ENV['TOKEN'])

payload = e.retrieve(project_name: 'labors', model_name: 'monster', record_names: [ 'Nemean Lion', 'Lernean Hydra' ], attribute_names: "all")


The main way to interact with Magma directly is via its API (you may also perform a great many of the same operations using the data browser Timur).

There are four main endpoints: update, retrieve, query, and update_model. All of them expect a POST in JSON format with a valid Etna authorization header (i.e., Authorization: Etna <valid janus token>).




The basic revision format looks like this:

  "project_name" : "labors",
  "revisions" : {
    "monster" : {
      "Nemean Lion" : {
        "species" : 'lion'
  "Lernean Hydra" : {
    "species" : 'hydra'



Required parameters:

Optional parameters:


A basic request for a record looks like this:

  "project_name"    : "labors",
  "model_name"      : "labor",
  "record_names"    : [ "Nemean Lion" ],
  "attribute_names" : [ "name", "number", "completed" ]

The output is in “payload” format, containing a hash { models } keyed by model_name, and returning for each model { documents, template }. The template is a complete description of the model sufficient for import into another Magma instance. The returned documents are keyed by the record identifiers, with each record containing values for the attributes requested in attribute_names.

  "models": {
    "labor": {
      "documents": {
        "Nemean Lion": {
          "name": "Nemean Lion",
          "number": 1,
          "completed": true
      "template": {
        "name": "labor",
        "attributes": {
          "name": {
            "name": "name",
            "type": "String",
            "attribute_class": "Magma::Attribute",
            "display_name": "Name",
            "shown": true
          // etc. for ALL attributes, not just requested
    "identifier": "name",
    "parent": "project"

A few special cases exist. Here is the “template” query, which will retrieve all of the project templates but no documents:

{ "project_name": "labors", "model_name": "all", "record_names":[], "attribute_names": "all" }

The “identifier” query will retrieve all of the project identifiers at once:

{ "project_name": "labors", "model_name": "all", "record_names": "all", "attribute_names": "identifier" }


The Magma Query API lets you pull data out of Magma through an expressive query interface.


A general form of the query is:

[ *predicate_args, *verb_args, *predicate_args, *verb_args, ... ]

A basic query might look like this:

[ 'labor', '::all', 'name' ]

A breakdown of the terms: labor - specifies the model we wish to search, yielding a model predicate ::all - a verb argument to the model predicate, iterating across all of the items in the model, and yielding a record predicate name - a verb argument to the record predicate specifying an attribute name, yielding a value and terminating the query

While the query must eventually terminate in a value (or array of values if an array argument is passed to a record predicate), via records we might traverse through the graph first:

[ 'labor', '::all', 'monster', 'victim', '::first', 'city' ]

The response:

   "answer" : [ [ 'Nemean Lion', 'Nemean Lion' ], [ 'Lernean Hydra', 'Lernean Hydra' ] ]
   "format" : ['labors::labor#name', 'labors::labor#name']

The format describes the returned values. If the format is an array, the format will contain a list of items with the given format. The format is usually written in project_name::model_name#attribute_name format.

A more advanced query might include a filter:

[ 'monster', [ '::has', 'stats' ], '::all', 'name' ]

Filters may be applied to any model we traverse through:

[ 'labor', '::all', 'prize', [ 'worth', '>', '200' ], '::first', 'name' ]


There are a handful of predicate types, each of which take various arguments.


The first predicate initiates the query and usually takes a model name as an argument:

<model_name> - a string specifying the model to be searched

You may also pass the following as initial arguments:

::predicates - return a list of the available predicates and their verbs
::model_names - return a list of model names for the project being queried

A Model predicate is our query starting point and specifies a set of records. Model predicates can accept an arbitrary number of filter [] arguments, followed by:

::first - reduce this model to a single item
::all - return a vector of values for this model, labeled with this model's identifiers
::attribute_names - return a list of attribute names for this model (only following the start predicate)

A Record predicate follows after a Model predicate. The valid arguments are:

<attribute_name> - a string specifying an attribute on this model
::has, <attribute_name> - a boolean test for the existence of <attribute_name> (i.e., the data is not null)
::identifier - an alias for the attribute_name of this Model's identifier. E.g., if a Sample has identifier attribute 'sample_name', '::identifier' will return the same value as 'sample_name'

Column attributes usually just return their value. However, you may optionally follow them with arguments to apply a boolean test.


::equals, <string> - A boolean test for equality, e.g. [ 'sample_name', '::equals', 'Dumbo' ]
::in, [ list of strings ] - A boolean test for membership, e.g., [ 'sample_name', '::in', [ 'ant', 'bear', 'cat' ] ]
::matches, <string> - A boolean test for a regular expression match, e.g., [ 'sample_name', '::matches', '[GD]umbo' ]

integer, date_time

::<= - less than or equals
::< - less than
::>= - greater than or equals
::> - greater than
::= - equals


::true - is true
::false - is false

file, image

::url - a URL to retrieve this file resource
::path - the filename/path for this file resource


::slice - retrieve a subset of columns from the matrix

Example Queries

Using the examples above, you could formulate a query using a POST request and the following JSON payload:

  "query": [ 'labor', '::all', 'monster', 'name' ],
  "project": "labors"

Results in something like:

    "answer" : [ [ 'Nemean Lion', 'Nemean Lion' ], [ 'Lernean Hydra', 'Lernean Hydra' ] ]
    "format" : ['labors::labor#name', 'labors::monster#name']

To get a TSV back, you could add the format=tsv parameter:

  "query": [ 'labor', '::all', 'monster', 'name' ],
  "project": "labors",
  "format": "tsv"

Results in:

Nemean Lion\tNemean Lion
Lernean Hydra\tLernean Hydra

As noted above, this results in a two-column response. If you want to provide alternate column labels for the TSV, you can supply user_columns:

  "query": [ 'labor', '::all', 'monster', 'name' ],
  "project": "labors",
  "format": "tsv",
  "user_columns": ["Labor", "Roar"]

Results in:

Nemean Lion\tNemean Lion
Lernean Hydra\tLernean Hydra

Transposing the request:

  "query": [ 'labor', '::all', 'monster', 'name' ],
  "project": "labors",
  "format": "tsv",
  "user_columns": ["Labor", "Roar"],
  "transpose": true

Results in:

Labor\tNemean Lion\tLernean Hydra
Roar\tNemean Lion\tLernean Hydra

And expanding a matrix attribute (contributions):

  "query": [ 'labor', '::all', 'contributions', '::slice', ['Athens', 'Sparta'] ],
  "project": "labors",
  "format": "tsv",
  "user_columns": ["Labor", "Share"],
  "expand_matrices": true

Results in:

Nemean Lion\t10\t20
Lernean Hydra\t11\t21

Unexpanded, the data for the matrix attribute will nest into a single cell:

  "query": [ 'labor', '::all', 'contributions', '::slice', ['Athens', 'Sparta'] ],
  "project": "labors",
  "format": "tsv",
  "user_columns": ["Labor", "Share"]

Results in:

Nemean Lion\t[10,20]
Lernean Hydra\t[11,21]

You can also transpose expanded matrices:

  "query": [ 'labor', '::all', 'contributions', '::slice', ['Athens', 'Sparta'] ],
  "project": "labors",
  "format": "tsv",
  "user_columns": ["Labor", "Share"],
  "expand_matrices": true,
  "transpose": true

Results in:

Labor\tNemean Lion\tLernean Hydra


Coming soon.



Start with a basic git checkout:

$ git clone

Magma is a Rack application, which means you can run it using any Rack-compatible server (e.g. Puma or Passenger).


Magma has a single YAML config file, config.yml; DO NOT TRACK this file, as it will hold all of your secrets. It uses the Etna::Application configuration syntax. See config.yml.template for an example configuration.


Magma attempts to maintain a strict adherence between its models and the database schema by suggesting migrations. These are written in the Sequel ORM’s migration language, not pure SQL, so they are fairly straightforward to amend when Magma plans incorrectly.

To plan a new set of migrations, the first step is to amend your models. This also works in the case of entirely new models. Simply sketch them out as described above, setting out the attributes each model requires and creating links between them.

Once you’ve defined your models, you can execute bin/magma plan to create a new migration. If you want to restrict your plan to a single project you may do bin/magma plan <project_name>. Magma will output ruby code for a migration using the Sequel ORM - you can save this in your project’s migration folder (e.g. project/my_project/migration/01_initial_migration.rb).

After your migrations are in place, you can try to run them using bin/magma migrate, which will attempt to run migrations that have not been run yet. If you change your mind, you can roll backwards (depending on how reversible your migration is) using bin/magma migrate <migration version number>.