Noronha’s Data Model¶
This section describes how Noronha stores its metadata in MongoDB and how these metadata documents relate to each other. Reading and understanding this section is going to help you when creating and manipulating projects, models and other objects in Noronha.
The diagram bellow gives a hint on the document relationships described in detail here:
This guide adopts the following conventions for representing document fields that link to other documents:
Referenced document: fields in bold are like pointers. Their content is always consistent with the original document they refer to. Fields like these are meant to answer questions like: which model is my project using? Which model files are expected?
Embedded document: fields in italic are like snapshots. Their content is a report of how the referred document was when the field was updated. Fields like these are meant to answer questions like: which version of my project’s code was used in that training?
Project¶
Represents a project that is managed by the framework. Also referred to as proj.
{
name: name-of-the-project # only alphanumerical and dashes
desc: free text description
model: list of models used by this project
home_dir: local directory where the project is hosted
git_repo: the project's remote Git repository
docker_repo: the project's remote Docker repository
# see project repositories
}
Build Version¶
Represents the Docker image that was created when the project was built by Noronha. Also referred to as bvers or bv (not to mistake for beavers :D).
{
tag: Docker tag
proj: the project which was built
docker_id: the Docker hash associated to the image that was created
git_version: the Git hash associated to the last commit before the project was built
built_at: date and time when it was built
built_from: either 'local', 'git' or 'pre-built' (determined by the build command)
}
Model¶
Represents a model that is managed by the framework.
{
name: name-of-the-model # only alphanumerical and dashes
desc: free text description
model_files: list of file docs. These files compose the model's persistence
data_files: list of file docs. These files compose a dataset for training the model
}
# btw, this is how a file doc is defined:
{
name: file.extension
desc: free text description
required: if true, this file can never be left out
max_mb: maximum file size in MB. Not necessary, but good to know
}
Note that this is not a model version, but a model definition: it’s like a template that describes how a model is going to be persisted. Of course, when starting project we usually have no clue of how the model is going to be, but don’t worry: all properties except the model’s name can be edited later.
Dataset¶
Represents a dataset that is managed by the framework. Also referred to as ds (not a data scientist though :D).
{
name: name-of-the-dataset # only alphanumerical and dashes
model: the model to which this dataset belongs
stored: if true, the dataset files are stored in Noronha's file manager
details: dictionary with arbitrary details about the dataset
compressed: if true, all dataset files are compressed into a single tar.gz file
lightweight: if true, the dataset files are stored in a lightweight file storage
}
Training¶
Represents the execution of a training. Also referred to as train (not the one that runs on rails :D).
{
name: name-of-the-training # only alphanumerical and dashes
proj: the project responsible for this training
bvers: the build version that was used for running this training
notebook: relative path inside the project's repository to the training notebook that was executed
task: task doc. Represents the training's progress and state
details: dictionary with arbitrary details about the training
}
# btw, this is how a task doc is defined:
{
state: either one of WAITING, RUNNING, FINISHED, FAILED, CANCELLED
progress: number between 0 and 1
start_time: when the task started
update_time: when the task's state and/or progress was updated
}
Model Version¶
Represents a persistent model that was generated during a training. Also referred to as movers or mv.
{
name: name-of-the-version # only alphanumerical and dashes
model: the parent model definition (template) that shapes this version
train: the training execution that generated this version
ds: the dataset that was used for training the model
details: dictionary with arbitrary details about the version
pretrained: reference to another model version that was used as a pre-trained asset in order to train this one
compressed: if true, all model files are compressed into a single tar.gz file
lightweight: if true, the model files are stored in a lightweight file storage
}
Deployment¶
Represents a group of one or more identical containers providing a prediction service. Also referred to as depl.
{
name: name-of-the-deployment # only alphanumerical and dashes
proj: the project to which this deployment belongs
movers: the model version used in this deployment
bvers: the build version (docker image) used for creating this deployment's containers
notebook: relative path inside the project's repository to the prediction notebook that is executed
details: dictionary with arbitrary details about the deployment
}
Treasure Chest¶
Represents a pair of credentials recorded and stored securely in the framework. Also referred to as tchest.
{
name: name-of-the-tchest # only alphanumerical and dashes
owner: os-user-to-whom-it-belongs
desc: free text description
details: dictionary with arbitrary details about the tchest
}