Noronha’s Data Model

This section describes how Noronha stores its metadata in MongoDB and how these metadata documents relate to each other. Reading and understanding this section is going to help you when creating and manipulating projects, models and other objects in Noronha.

The diagram bellow gives a hint on the document relationships described in detail here:

../_images/data_model.png

This guide adopts the following conventions for representing document fields that link to other documents:

  • Referenced document: fields in bold are like pointers. Their content is always consistent with the original document they refer to. Fields like these are meant to answer questions like: which model is my project using? Which model files are expected?

  • Embedded document: fields in italic are like snapshots. Their content is a report of how the referred document was when the field was updated. Fields like these are meant to answer questions like: which version of my project’s code was used in that training?

Project

Represents a project that is managed by the framework. Also referred to as proj.

{
    name: name-of-the-project  # only alphanumerical and dashes
    desc: free text description
    model: list of models used by this project
    home_dir: local directory where the project is hosted
    git_repo: the project's remote Git repository
    docker_repo: the project's remote Docker repository
    # see project repositories
}

Build Version

Represents the Docker image that was created when the project was built by Noronha. Also referred to as bvers or bv (not to mistake for beavers :D).

{
    tag: Docker tag
    proj: the project which was built
    docker_id: the Docker hash associated to the image that was created
    git_version: the Git hash associated to the last commit before the project was built
    built_at: date and time when it was built
    built_from: either 'local', 'git' or 'pre-built' (determined by the build command)
}

Model

Represents a model that is managed by the framework.

{
    name: name-of-the-model  # only alphanumerical and dashes
    desc: free text description
    model_files: list of file docs. These files compose the model's persistence
    data_files:  list of file docs. These files compose a dataset for training the model
}

# btw, this is how a file doc is defined:
{
    name: file.extension
    desc: free text description
    required: if true, this file can never be left out
    max_mb: maximum file size in MB. Not necessary, but good to know
}

Note that this is not a model version, but a model definition: it’s like a template that describes how a model is going to be persisted. Of course, when starting project we usually have no clue of how the model is going to be, but don’t worry: all properties except the model’s name can be edited later.

Dataset

Represents a dataset that is managed by the framework. Also referred to as ds (not a data scientist though :D).

{
    name: name-of-the-dataset  # only alphanumerical and dashes
    model: the model to which this dataset belongs
    stored: if true, the dataset files are stored in Noronha's file manager
    details: dictionary with arbitrary details about the dataset
    compressed: if true, all dataset files are compressed into a single tar.gz file
    lightweight: if true, the dataset files are stored in a lightweight file storage

}

Training

Represents the execution of a training. Also referred to as train (not the one that runs on rails :D).

{
    name: name-of-the-training  # only alphanumerical and dashes
    proj: the project responsible for this training
    bvers: the build version that was used for running this training
    notebook: relative path inside the project's repository to the training notebook that was executed
    task: task doc. Represents the training's progress and state
    details: dictionary with arbitrary details about the training
}

# btw, this is how a task doc is defined:
{
    state: either one of WAITING, RUNNING, FINISHED, FAILED, CANCELLED
    progress: number between 0 and 1
    start_time: when the task started
    update_time: when the task's state and/or progress was updated
}

Model Version

Represents a persistent model that was generated during a training. Also referred to as movers or mv.

{
    name: name-of-the-version  # only alphanumerical and dashes
    model: the parent model definition (template) that shapes this version
    train: the training execution that generated this version
    ds: the dataset that was used for training the model
    details: dictionary with arbitrary details about the version
    pretrained: reference to another model version that was used as a pre-trained asset in order to train this one
    compressed: if true, all model files are compressed into a single tar.gz file
    lightweight: if true, the model files are stored in a lightweight file storage
}

Deployment

Represents a group of one or more identical containers providing a prediction service. Also referred to as depl.

{
    name: name-of-the-deployment  # only alphanumerical and dashes
    proj: the project to which this deployment belongs
    movers: the model version used in this deployment
    bvers: the build version (docker image) used for creating this deployment's containers
    notebook: relative path inside the project's repository to the prediction notebook that is executed
    details: dictionary with arbitrary details about the deployment
}

Treasure Chest

Represents a pair of credentials recorded and stored securely in the framework. Also referred to as tchest.

{
    name: name-of-the-tchest  # only alphanumerical and dashes
    owner: os-user-to-whom-it-belongs
    desc: free text description
    details: dictionary with arbitrary details about the tchest
}