Skip to content

Understanding Merlin's Database Entities

Merlin's database has the following entities:

Study Entity

The StudyEntity represents a single “study” or experiment grouping in Merlin. Each entry is unique to study name (defined in the description block of the specification file). Each study acts as a namespace under which runs are organized and tracked.

Key Type Description
id uuid4 Primary key. Unique identifier for the study.
name str Human-readable name for the study, unique to each StudyEntity.
runs List[uuid4] List of Run IDs associated with this study.

Relationships:

  • One-to-many with RunEntity: A single study can have multiple runs.

Run Entity

The RunEntity represents a single execution of a study. It captures the configuration, intermediate data, and relationships to other entities (like workers and studies).

Column Type Description
id uuid4 Primary key. Unique identifier for the run.
study_id uuid4 Foreign key → StudyEntity(id). Which study this run belongs to.
workspace str Filesystem path where outputs of the study are stored.
steps List[uuid4] Ordered list of Step IDs executed in this run.
queues List[str] List of task queue names used by this run.
workers List[uuid4] List of LogicalWorker IDs serving tasks for this run.
parent uuid4 \| NULL ID of parent run (if this run was started by another run).
child uuid4 \| NULL ID of child run (if this run spawned a new run).
run_status str The status of a run.
parameters Dict Arbitrary key/value parameters provided to the run.
samples Dict Arbitrary samples provided to the run.

Relationships:

  • Many-to-one with StudyEntity: Multiple runs can be assigned to the same study.
  • Many-to-many with LogicalWorkerEntity: Multiple runs can be linked to multiple logical workers.
  • Optional one-to-one with parent/child RunEntity: A single run can link to another run.

Run Status

The run_status entry is not what the status commands are tracking. Those commands track step- and task-level statuses. This entry is tracking run-level status which becomes important for the merlin monitor command.

Below is a table of possible statuses for a run.

Status Description
INITIALIZED Run has been created in the database but not queued.
QUEUED Run is queued on the task server and waiting to start.
RUNNING Run is currently executing.
COMPLETED Run has finished successfully.
CANCELLED Run was cancelled by the user.
FAILED Run hard failed due to an error.

Worker Entities

Merlin supports two distinct worker models: logical and physical. Logical workers define high-level behavior and configuration. Physical workers represent actual runtime processes launched from logical definitions. The below sections will go into further detail on both entities.

Logical Worker Entity

The LogicalWorkerEntity defines an abstract worker configuration — including queues and a name — which serves as a template for actual (physical) worker instances. Each logical worker is unique to its name and queues. For instance, LogicalWorker(name=worker1, queues=[queue1, queue2]) is different from LogicalWorker(name=worker1, queues=[queue1]) which is also different from LogicalWorker(name=worker2, queues=[queue1, queue2]).

Column Type Description
id uuid4 Primary key. Deterministically generated from name + queues.
name str Logical name of the worker (e.g., "data-processor").
queues List[str] The set of queue names this logical worker listens on.
runs List[uuid4] List of RunEntity IDs currently using this logical worker.
physical_workers List[uuid4] List of PhysicalWorker IDs instantiated from this logical template.

Relationships:

  • One-to-many with PhysicalWorkerEntity: A single logical worker can have multiple physical worker instances.
  • Many-to-many with RunEntity: Multiple logical workers can be linked to multiple runs.

Physical Worker Entity

The PhysicalWorkerEntity represents an actual running instance of a worker process, created from a logical worker definition. It contains runtime-specific metadata for monitoring and control.

Column Type Description
id uuid4 Primary key. Unique identifier for this running process.
logical_worker_id uuid4 Foreign key → LogicalWorkerEntity(id).
name str Full Celery worker name (e.g., celery@hostname).
launch_cmd str Exact CLI used to start the worker.
args Dict Additional runtime args or config passed to the worker process.
pid str OS process ID in string format.
worker_status WorkerStatus Current status of the worker (e.g., RUNNING, STOPPED).
heartbeat_timestamp datetime Last time the worker checked in.
latest_start_time datetime When this process was most recently (re)launched.
host str Hostname or IP where this process is running.
restart_count int How many times the process has been restarted.

Relationships:

  • Many-to-one with LogicalWorkerEntity: Multiple physical workers can be linked to multiple logical workers.