Introduction to Mount Etna

Mount Etna is a set of applications that together provide tools for storing, organizing, viewing and analyzing research data sets.

How to organize data

Research science involves producing complex, highly structured datasets. The meticulous organization of this data is crucial to the scientific enterprise; in particular, the production of an error-free dataset can determine the success of an experiment, easily disrupted by the compromise of data integrity. The problem of how to produce this organization is therefore a critical research function.

In addition to the basic organization needs of the experiment, science has an important social component - knowledge is produced through a collective consensus, which requires the experimental data and method to be available to the community of researchers for validation and follow-up research.

Producing this organization and facilitating this sharing is the fundamental goal of the data library. In aid of this we wish to produce a set of tools allowing easy management of all aspects of the data life cycle.

But what is encapsulated in this nebulous word “data”? In the modern biomedical research setting (the main focus of the data library project), experimental data usually consists of measurements made on living tissue taken from study subjects, which may be human patients, cell lines, or model organisms. The measurements may be simple, as in the case of a clinical lab summarized by a single metric, or complex, as in the case of a sequencing experiment collecting data about millions of individual molecules.

In all cases, our goal is the same: to organize data so that experimental data is correctly associated with its point of origin in the experimental protocol. Our basic approach to this organization in the data library is to describe the dataset as a collection of experimental entities to which particular items of data attach. For example, a study may contain a set of subjects, each of whom produce a set of associated samples, upon which a set of experimental assays may be performed. Each of these entities (the subject, the sample, the assay) may have particular items of data associated with it, e.g., the subject may have associated demographic data like age, sex, etc., while the assay may have associated quality control values. When describing an entity, we wish to associate with it all of the data describing attributes of that entity, as well as noting its relationships to other data entities.

The fundamental unit of this organization in the data library is the project, which describes a single experiment or research study conducted by a team of researchers.

The basic functions of the data library