Skip to content

data_models

This module houses dataclasses that define the format of the data that's stored in Merlin's database.

BaseDataModel dataclass

Bases: ABC

A base class for dataclasses that provides common serialization, deserialization, and update functionality, with support for additional data.

This class is designed to be extended by other dataclasses and includes methods for converting instances to and from dictionaries or JSON, managing fields, and updating field values with validation.

Attributes:

Name Type Description
additional_data Dict

A dictionary to store any extra data not explicitly defined as fields in the dataclass.

fields_allowed_to_be_updated List[str]

A list of field names that are allowed to be updated. Must be defined in subclasses.

Methods:

Name Description
to_dict

Convert the dataclass instance to a dictionary.

to_json

Serialize the dataclass instance to a JSON string.

from_dict

Create an instance of the dataclass from a dictionary.

from_json

Create an instance of the dataclass from a JSON string.

dump_to_json_file

Dump the data of this dataclass to a JSON file.

load_from_json_file

Load the data stored in a JSON file to this dataclass.

fields

Retrieve the fields associated with this dataclass instance or class.

fields

Retrieve the fields associated with the dataclass class itself.

update_fields

Update the fields of the dataclass based on a given dictionary of updates.

Source code in merlin/db_scripts/data_models.py
@dataclass
class BaseDataModel(ABC):
    """
    A base class for dataclasses that provides common serialization, deserialization, and
    update functionality, with support for additional data.

    This class is designed to be extended by other dataclasses and includes methods for
    converting instances to and from dictionaries or JSON, managing fields, and updating
    field values with validation.

    Attributes:
        additional_data: A dictionary to store any extra data not explicitly defined
            as fields in the dataclass.
        fields_allowed_to_be_updated: A list of field names that are allowed to be updated.
            Must be defined in subclasses.

    Methods:
        to_dict:
            Convert the dataclass instance to a dictionary.

        to_json:
            Serialize the dataclass instance to a JSON string.

        from_dict (classmethod):
            Create an instance of the dataclass from a dictionary.

        from_json (classmethod):
            Create an instance of the dataclass from a JSON string.

        dump_to_json_file:
            Dump the data of this dataclass to a JSON file.

        load_from_json_file (classmethod):
            Load the data stored in a JSON file to this dataclass.

        fields:
            Retrieve the fields associated with this dataclass instance or class.

        fields (classmethod):
            Retrieve the fields associated with the dataclass class itself.

        update_fields:
            Update the fields of the dataclass based on a given dictionary of updates.
    """

    additional_data: Dict = field(default_factory=dict)

    def to_dict(self) -> Dict:
        """
        Convert the dataclass to a dictionary.

        Returns:
            The dataclass as a dictionary.
        """
        return asdict(self)

    def to_json(self) -> str:
        """
        Serialize the dataclass to a JSON string.

        Returns:
            The dataclass as a JSON string.
        """
        return json.dumps(self.to_dict())

    @classmethod
    def from_dict(cls: Type[T], data: Dict) -> T:
        """
        Create an instance of the dataclass from a dictionary.

        Args:
            data: A dictionary to turn into an instance of this dataclass.

        Returns:
            An instance of the dataclass that called this.
        """
        # Handle backwards compatibility between 2.0.0b2 and 2.0.0b3: migrate run_complete to run_status
        if "run_complete" in data and "run_status" not in data:
            run_complete = data.pop("run_complete")
            # Map the boolean to an appropriate status
            data["run_status"] = RunStatus.COMPLETED.value if run_complete else RunStatus.RUNNING.value
        elif "run_complete" in data and "run_status" in data:
            # Remove run_complete if both keys exist to avoid conflicts
            data.pop("run_complete")

        return cls(**data)

    @classmethod
    def from_json(cls: Type[T], json_str: str) -> T:
        """
        Create an instance of the dataclass from a JSON string.

        Args:
            json_str: A JSON string to turn into an instance of this dataclass.

        Returns:
            An instance of the dataclass that called this.
        """
        data = json.loads(json_str)
        return cls.from_dict(data)

    def dump_to_json_file(self, filepath: str):
        """
        Dump the data of this dataclass to a JSON file.

        Args:
            filepath: The path to the JSON file where the data will be written.

        Raises:
            ValueError: If the `filepath` is not provided or is invalid.
        """
        if not filepath:
            raise ValueError("A valid file path must be provided.")

        # Ensure the directory for the file exists
        os.makedirs(os.path.dirname(filepath), exist_ok=True)

        # Create a lock file alongside the target JSON file
        lock_file = f"{filepath}.lock"
        with FileLock(lock_file):  # pylint: disable=abstract-class-instantiated
            # Write the data to the JSON file
            temp_filepath = f"{filepath}.tmp"  # Use a temporary file for atomic writes
            with open(temp_filepath, "w") as json_file:
                json.dump(self.to_dict(), json_file, indent=4)

            # Replace the temporary file with the target file
            os.replace(temp_filepath, filepath)

        LOG.debug(f"Data successfully dumped to {filepath}.")

    @classmethod
    def load_from_json_file(cls: Type[T], filepath: str) -> T:
        """
        Load the data stored in a JSON file to this dataclass.

        Args:
            filepath: The path to the JSON file where the data is located.

        Raises:
            ValueError: If the `filepath` is not provided or is invalid.
        """
        if not filepath or not os.path.exists(filepath):
            raise ValueError("A valid file path must be provided.")

        # Create a lock file alongside the target JSON file
        lock_file = f"{filepath}.lock"
        with FileLock(lock_file):  # pylint: disable=abstract-class-instantiated
            with open(filepath, "r") as json_file:
                # Parse the JSON data into a dictionary
                data = json.load(json_file)

        # Use from_dict to create an instance of the dataclass
        return cls.from_dict(data)

    def get_instance_fields(self) -> Tuple[Field]:
        """
        Get the fields associated with this instance. Added this method so that the dataclass.fields
        doesn't have to be imported each time you want this info.

        Returns:
            A tuple of dataclass.Field objects representing the fields in this data class.
        """
        return dataclass_fields(self)

    @classmethod
    def get_class_fields(cls) -> Tuple[Field]:
        """
        Get the fields associated with this object. Added this method so that the dataclass.fields
        doesn't have to be imported each time you want this info.

        Returns:
            A tuple of dataclass.Field objects representing the fields in this data class.
        """
        return dataclass_fields(cls)

    @property
    @abstractmethod
    def fields_allowed_to_be_updated(self) -> List[str]:
        """
        A property to be overridden in subclasses to define which fields are allowed to be updated.

        Returns:
            A list of fields that are allowed to be updated in this class.
        """

    def update_fields(self, updates: Dict):
        """
        Given a dictionary of updates to be made to this data class, loop through the updates
        applying them when valid.

        Args:
            updates: A dictionary of updates to be made to this data class.
        """
        # Iterate through the updates
        for field_name, new_value in updates.items():
            if field_name == "id":
                continue

            if hasattr(self, field_name):
                if getattr(self, field_name) == new_value:  # Not an update so skip
                    continue

                if field_name in self.fields_allowed_to_be_updated:
                    # Update the allowed field
                    setattr(self, field_name, new_value)
                else:
                    # Log a warning for unauthorized updates
                    LOG.warning(f"Field '{field_name}' is not allowed to be updated. Ignoring the change.")
            else:
                # Log a warning if the field doesn't exist explicitly
                LOG.warning(
                    f"Field '{field_name}' does not explicitly exist in the object. Adding it to the 'additional_data' field."
                )
                self.additional_data[field_name] = new_value

fields_allowed_to_be_updated abstractmethod property

A property to be overridden in subclasses to define which fields are allowed to be updated.

Returns:

Type Description
List[str]

A list of fields that are allowed to be updated in this class.

dump_to_json_file(filepath)

Dump the data of this dataclass to a JSON file.

Parameters:

Name Type Description Default
filepath str

The path to the JSON file where the data will be written.

required

Raises:

Type Description
ValueError

If the filepath is not provided or is invalid.

Source code in merlin/db_scripts/data_models.py
def dump_to_json_file(self, filepath: str):
    """
    Dump the data of this dataclass to a JSON file.

    Args:
        filepath: The path to the JSON file where the data will be written.

    Raises:
        ValueError: If the `filepath` is not provided or is invalid.
    """
    if not filepath:
        raise ValueError("A valid file path must be provided.")

    # Ensure the directory for the file exists
    os.makedirs(os.path.dirname(filepath), exist_ok=True)

    # Create a lock file alongside the target JSON file
    lock_file = f"{filepath}.lock"
    with FileLock(lock_file):  # pylint: disable=abstract-class-instantiated
        # Write the data to the JSON file
        temp_filepath = f"{filepath}.tmp"  # Use a temporary file for atomic writes
        with open(temp_filepath, "w") as json_file:
            json.dump(self.to_dict(), json_file, indent=4)

        # Replace the temporary file with the target file
        os.replace(temp_filepath, filepath)

    LOG.debug(f"Data successfully dumped to {filepath}.")

from_dict(data) classmethod

Create an instance of the dataclass from a dictionary.

Parameters:

Name Type Description Default
data Dict

A dictionary to turn into an instance of this dataclass.

required

Returns:

Type Description
T

An instance of the dataclass that called this.

Source code in merlin/db_scripts/data_models.py
@classmethod
def from_dict(cls: Type[T], data: Dict) -> T:
    """
    Create an instance of the dataclass from a dictionary.

    Args:
        data: A dictionary to turn into an instance of this dataclass.

    Returns:
        An instance of the dataclass that called this.
    """
    # Handle backwards compatibility between 2.0.0b2 and 2.0.0b3: migrate run_complete to run_status
    if "run_complete" in data and "run_status" not in data:
        run_complete = data.pop("run_complete")
        # Map the boolean to an appropriate status
        data["run_status"] = RunStatus.COMPLETED.value if run_complete else RunStatus.RUNNING.value
    elif "run_complete" in data and "run_status" in data:
        # Remove run_complete if both keys exist to avoid conflicts
        data.pop("run_complete")

    return cls(**data)

from_json(json_str) classmethod

Create an instance of the dataclass from a JSON string.

Parameters:

Name Type Description Default
json_str str

A JSON string to turn into an instance of this dataclass.

required

Returns:

Type Description
T

An instance of the dataclass that called this.

Source code in merlin/db_scripts/data_models.py
@classmethod
def from_json(cls: Type[T], json_str: str) -> T:
    """
    Create an instance of the dataclass from a JSON string.

    Args:
        json_str: A JSON string to turn into an instance of this dataclass.

    Returns:
        An instance of the dataclass that called this.
    """
    data = json.loads(json_str)
    return cls.from_dict(data)

get_class_fields() classmethod

Get the fields associated with this object. Added this method so that the dataclass.fields doesn't have to be imported each time you want this info.

Returns:

Type Description
Tuple[Field]

A tuple of dataclass.Field objects representing the fields in this data class.

Source code in merlin/db_scripts/data_models.py
@classmethod
def get_class_fields(cls) -> Tuple[Field]:
    """
    Get the fields associated with this object. Added this method so that the dataclass.fields
    doesn't have to be imported each time you want this info.

    Returns:
        A tuple of dataclass.Field objects representing the fields in this data class.
    """
    return dataclass_fields(cls)

get_instance_fields()

Get the fields associated with this instance. Added this method so that the dataclass.fields doesn't have to be imported each time you want this info.

Returns:

Type Description
Tuple[Field]

A tuple of dataclass.Field objects representing the fields in this data class.

Source code in merlin/db_scripts/data_models.py
def get_instance_fields(self) -> Tuple[Field]:
    """
    Get the fields associated with this instance. Added this method so that the dataclass.fields
    doesn't have to be imported each time you want this info.

    Returns:
        A tuple of dataclass.Field objects representing the fields in this data class.
    """
    return dataclass_fields(self)

load_from_json_file(filepath) classmethod

Load the data stored in a JSON file to this dataclass.

Parameters:

Name Type Description Default
filepath str

The path to the JSON file where the data is located.

required

Raises:

Type Description
ValueError

If the filepath is not provided or is invalid.

Source code in merlin/db_scripts/data_models.py
@classmethod
def load_from_json_file(cls: Type[T], filepath: str) -> T:
    """
    Load the data stored in a JSON file to this dataclass.

    Args:
        filepath: The path to the JSON file where the data is located.

    Raises:
        ValueError: If the `filepath` is not provided or is invalid.
    """
    if not filepath or not os.path.exists(filepath):
        raise ValueError("A valid file path must be provided.")

    # Create a lock file alongside the target JSON file
    lock_file = f"{filepath}.lock"
    with FileLock(lock_file):  # pylint: disable=abstract-class-instantiated
        with open(filepath, "r") as json_file:
            # Parse the JSON data into a dictionary
            data = json.load(json_file)

    # Use from_dict to create an instance of the dataclass
    return cls.from_dict(data)

to_dict()

Convert the dataclass to a dictionary.

Returns:

Type Description
Dict

The dataclass as a dictionary.

Source code in merlin/db_scripts/data_models.py
def to_dict(self) -> Dict:
    """
    Convert the dataclass to a dictionary.

    Returns:
        The dataclass as a dictionary.
    """
    return asdict(self)

to_json()

Serialize the dataclass to a JSON string.

Returns:

Type Description
str

The dataclass as a JSON string.

Source code in merlin/db_scripts/data_models.py
def to_json(self) -> str:
    """
    Serialize the dataclass to a JSON string.

    Returns:
        The dataclass as a JSON string.
    """
    return json.dumps(self.to_dict())

update_fields(updates)

Given a dictionary of updates to be made to this data class, loop through the updates applying them when valid.

Parameters:

Name Type Description Default
updates Dict

A dictionary of updates to be made to this data class.

required
Source code in merlin/db_scripts/data_models.py
def update_fields(self, updates: Dict):
    """
    Given a dictionary of updates to be made to this data class, loop through the updates
    applying them when valid.

    Args:
        updates: A dictionary of updates to be made to this data class.
    """
    # Iterate through the updates
    for field_name, new_value in updates.items():
        if field_name == "id":
            continue

        if hasattr(self, field_name):
            if getattr(self, field_name) == new_value:  # Not an update so skip
                continue

            if field_name in self.fields_allowed_to_be_updated:
                # Update the allowed field
                setattr(self, field_name, new_value)
            else:
                # Log a warning for unauthorized updates
                LOG.warning(f"Field '{field_name}' is not allowed to be updated. Ignoring the change.")
        else:
            # Log a warning if the field doesn't exist explicitly
            LOG.warning(
                f"Field '{field_name}' does not explicitly exist in the object. Adding it to the 'additional_data' field."
            )
            self.additional_data[field_name] = new_value

LogicalWorkerModel dataclass

Bases: BaseDataModel

Represents a high-level definition of a Celery worker, as defined by the user.

Logical workers are abstract representations of workers that define their behavior and configuration, such as the queues they listen to and their name. They are unique based on their name and queues, and do not correspond directly to any running process. Instead, they serve as templates or logical definitions from which physical workers are created.

Note

Logical workers are abstract and do not represent actual running processes. They are used to define worker behavior and configuration at a high level, while physical workers represent the actual running instances of these logical definitions.

Attributes:

Name Type Description
additional_data Dict

For any extra data not explicitly defined.

fields_allowed_to_be_updated List[str]

A list of field names that are allowed to be updated.

id str

A unique identifier for the logical worker. Defaults to a UUID string.

name str

The name of the logical worker.

physical_workers List[str]

A list of unique IDs of the physical worker instances created from this logical instance. Corresponds with PhyiscalWorkerModel entries.

queues List[str]

A list of task queues the worker is listening to.

runs List[str]

A list of unique IDs of the runs using this worker. Corresponds with RunModel entries.

Source code in merlin/db_scripts/data_models.py
@dataclass
class LogicalWorkerModel(BaseDataModel):
    """
    Represents a high-level definition of a Celery worker, as defined by the user.

    Logical workers are abstract representations of workers that define their behavior
    and configuration, such as the queues they listen to and their name. They are unique
    based on their name and queues, and do not correspond directly to any running process.
    Instead, they serve as templates or logical definitions from which physical workers
    are created.

    Note:
        Logical workers are abstract and do not represent actual running processes. They are
        used to define worker behavior and configuration at a high level, while physical workers
        represent the actual running instances of these logical definitions.

    Attributes:
        additional_data (Dict): For any extra data not explicitly defined.
        fields_allowed_to_be_updated (List[str]): A list of field names that are
            allowed to be updated.
        id (str): A unique identifier for the logical worker. Defaults to a UUID string.
        name (str): The name of the logical worker.
        physical_workers (List[str]): A list of unique IDs of the physical worker instances
            created from this logical instance. Corresponds with
            [`PhyiscalWorkerModel`][db_scripts.data_models.PhysicalWorkerModel] entries.
        queues (List[str]): A list of task queues the worker is listening to.
        runs (List[str]): A list of unique IDs of the runs using this worker.
            Corresponds with [`RunModel`][db_scripts.data_models.RunModel] entries.
    """

    name: str = None
    queues: Set[str] = field(default_factory=set)
    id: str = None  # pylint: disable=invalid-name
    runs: List[str] = field(default_factory=list)
    physical_workers: List[str] = field(default_factory=list)

    def __post_init__(self):
        """
        Generate and save a UUID based on the values of `name` and `queues`, to help ensure that
        each logical worker is unique to these values.

        Raises:
            TypeError: When name or queues are not provided to the constructor.
        """
        if self.name is None or not self.queues:
            raise TypeError("The `name` and `queues` arguments of LogicalWorkerModel are required.")

        generated_id = self.generate_id(self.name, self.queues)
        if self.id != generated_id:
            if self.id is not None:
                LOG.warning(f"ID '{self.id}' for LogicalWorkerModel was provided but it will be overwritten.")
            self.id = generated_id

    @classmethod
    def generate_id(cls, name: str, queues: List[str]) -> uuid.UUID:
        """
        Generate a UUID based on the values of `name` and `queues`.

        Args:
            name: The name of the logical worker.
            queues: The queues that the logical worker is assigned.

        Returns:
            A UUID based on the values of `name` and `queues`.
        """
        unique_string = f"{name}:{','.join(sorted(queues))}"
        hex_string = hashlib.md5(unique_string.encode("UTF-8")).hexdigest()
        return str(uuid.UUID(hex=hex_string))

    @property
    def fields_allowed_to_be_updated(self) -> List[str]:
        """
        Define the fields that are allowed to be updated for a `LogicalWorkerModel` object.

        Returns:
            A list of fields that are allowed to be updated in this class.
        """
        return ["runs", "physical_workers"]

fields_allowed_to_be_updated property

Define the fields that are allowed to be updated for a LogicalWorkerModel object.

Returns:

Type Description
List[str]

A list of fields that are allowed to be updated in this class.

__post_init__()

Generate and save a UUID based on the values of name and queues, to help ensure that each logical worker is unique to these values.

Raises:

Type Description
TypeError

When name or queues are not provided to the constructor.

Source code in merlin/db_scripts/data_models.py
def __post_init__(self):
    """
    Generate and save a UUID based on the values of `name` and `queues`, to help ensure that
    each logical worker is unique to these values.

    Raises:
        TypeError: When name or queues are not provided to the constructor.
    """
    if self.name is None or not self.queues:
        raise TypeError("The `name` and `queues` arguments of LogicalWorkerModel are required.")

    generated_id = self.generate_id(self.name, self.queues)
    if self.id != generated_id:
        if self.id is not None:
            LOG.warning(f"ID '{self.id}' for LogicalWorkerModel was provided but it will be overwritten.")
        self.id = generated_id

generate_id(name, queues) classmethod

Generate a UUID based on the values of name and queues.

Parameters:

Name Type Description Default
name str

The name of the logical worker.

required
queues List[str]

The queues that the logical worker is assigned.

required

Returns:

Type Description
UUID

A UUID based on the values of name and queues.

Source code in merlin/db_scripts/data_models.py
@classmethod
def generate_id(cls, name: str, queues: List[str]) -> uuid.UUID:
    """
    Generate a UUID based on the values of `name` and `queues`.

    Args:
        name: The name of the logical worker.
        queues: The queues that the logical worker is assigned.

    Returns:
        A UUID based on the values of `name` and `queues`.
    """
    unique_string = f"{name}:{','.join(sorted(queues))}"
    hex_string = hashlib.md5(unique_string.encode("UTF-8")).hexdigest()
    return str(uuid.UUID(hex=hex_string))

PhysicalWorkerModel dataclass

Bases: BaseDataModel

Represents a running instance of a Celery worker, created from a logical worker definition.

Physical workers are the actual implementations of logical workers, running as processes on a host machine. They are responsible for executing tasks defined in the queues specified by their corresponding logical worker. Each physical worker is uniquely identified and includes runtime-specific details such as its PID, status, and heartbeat timestamp.

Attributes:

Name Type Description
additional_data Dict

For any extra data not explicitly defined.

args Dict

A dictionary of arguments used to configure the worker.

fields_allowed_to_be_updated List[str]

A list of field names that are allowed to be updated.

heartbeat_timestamp datetime

The last time the worker sent a heartbeat signal.

host str

The hostname or IP address of the machine running the worker.

id str

A unique identifier for the physical worker. Defaults to a UUID string.

latest_start_time datetime

The timestamp when the worker process was last started.

launch_cmd str

The command used to launch the worker process.

logical_worker_id str

The ID of the logical worker that this was created from.

name str

The name of the physical worker.

pid str

The process ID (PID) of the worker process.

restart_count int

The number of times this worker has been restarted.

worker_status str

The current status of the worker (e.g., running, stopped).

Source code in merlin/db_scripts/data_models.py
@dataclass
class PhysicalWorkerModel(BaseDataModel):  # pylint: disable=too-many-instance-attributes
    """
    Represents a running instance of a Celery worker, created from a logical worker definition.

    Physical workers are the actual implementations of logical workers, running as processes on a host machine.
    They are responsible for executing tasks defined in the queues specified by their corresponding logical worker.
    Each physical worker is uniquely identified and includes runtime-specific details such as its PID, status, and
    heartbeat timestamp.

    Attributes:
        additional_data (Dict): For any extra data not explicitly defined.
        args (Dict): A dictionary of arguments used to configure the worker.
        fields_allowed_to_be_updated (List[str]): A list of field names that are
            allowed to be updated.
        heartbeat_timestamp (datetime): The last time the worker sent a heartbeat signal.
        host (str): The hostname or IP address of the machine running the worker.
        id (str): A unique identifier for the physical worker. Defaults to a UUID string.
        latest_start_time (datetime): The timestamp when the worker process was last started.
        launch_cmd (str): The command used to launch the worker process.
        logical_worker_id (str): The ID of the logical worker that this was created from.
        name (str): The name of the physical worker.
        pid (str): The process ID (PID) of the worker process.
        restart_count (int): The number of times this worker has been restarted.
        worker_status (str): The current status of the worker (e.g., running, stopped).
    """

    id: str = field(default_factory=lambda: str(uuid.uuid4()))  # pylint: disable=invalid-name
    logical_worker_id: str = None
    name: str = None  # Will be of the form celery@worker_name.hostname
    launch_cmd: str = None
    args: Dict = field(default_factory=dict)
    pid: str = None
    worker_status: str = field(default=WorkerStatus.STOPPED.value)
    heartbeat_timestamp: datetime = field(default_factory=datetime.now)
    latest_start_time: datetime = field(default_factory=datetime.now)
    host: str = None
    restart_count: int = 0

    @property
    def fields_allowed_to_be_updated(self) -> List[str]:
        """
        Define the fields that are allowed to be updated for a `PhysicalWorkerModel` object.

        Returns:
            A list of fields that are allowed to be updated in this class.
        """
        return [
            "launch_cmd",
            "args",
            "pid",
            "worker_status",
            "heartbeat_timestamp",
            "latest_start_time",
            "restart_count",
        ]

fields_allowed_to_be_updated property

Define the fields that are allowed to be updated for a PhysicalWorkerModel object.

Returns:

Type Description
List[str]

A list of fields that are allowed to be updated in this class.

RunModel dataclass

Bases: BaseDataModel

A dataclass to store all of the information for a run.

Attributes:

Name Type Description
additional_data Dict

For any extra data not explicitly defined.

child str

The ID of the child run (if any).

fields_allowed_to_be_updated List[str]

A list of field names that are allowed to be updated.

id str

The unique ID for the run.

parameters Dict

The parameters used in this run.

parent str

The ID of the parent run (if any).

queues List[str]

The task queues used for this run.

run_status RunStatus

The current status of the run.

samples Dict

The samples used in this run.

steps List[str]

A list of unique step IDs that are executed in this run. Each ID will correspond to a StepInfo entry.

study_id str

The unique ID of the study this run is associated with. Corresponds with a StudyModel entry.

workers List[str]

A list of worker ids executing tasks for this run. Each ID will correspond with a LogicalWorkerModel entry.

workspace str

The path to the output workspace.

Source code in merlin/db_scripts/data_models.py
@dataclass
class RunModel(BaseDataModel):  # pylint: disable=too-many-instance-attributes
    """
    A dataclass to store all of the information for a run.

    Attributes:
        additional_data (Dict): For any extra data not explicitly defined.
        child (str): The ID of the child run (if any).
        fields_allowed_to_be_updated (List[str]): A list of field names that are allowed
            to be updated.
        id (str): The unique ID for the run.
        parameters (Dict): The parameters used in this run.
        parent (str): The ID of the parent run (if any).
        queues (List[str]): The task queues used for this run.
        run_status (common.enums.RunStatus): The current status of the run.
        samples (Dict): The samples used in this run.
        steps (List[str]): A list of unique step IDs that are executed in this run.
            Each ID will correspond to a `StepInfo` entry.
        study_id (str): The unique ID of the study this run is associated with.
            Corresponds with a `StudyModel` entry.
        workers (List[str]): A list of worker ids executing tasks for this run. Each ID
            will correspond with a `LogicalWorkerModel` entry.
        workspace (str): The path to the output workspace.
    """

    id: str = field(default_factory=lambda: str(uuid.uuid4()))  # pylint: disable=invalid-name
    study_id: str = None
    workspace: str = None
    steps: List[str] = field(default_factory=list)  # TODO NOT YET IMPLEMENTED
    queues: List[str] = field(default_factory=list)
    workers: List[str] = field(default_factory=list)
    parent: str = None  # TODO NOT YET IMPLEMENTED; do we even have a good way that this and `child` can be set?
    child: str = None  # TODO NOT YET IMPLEMENTED
    run_status: str = field(default=RunStatus.INITIALIZED.value)
    parameters: Dict = field(default_factory=dict)  # TODO NOT YET IMPLEMENTED
    samples: Dict = field(default_factory=dict)  # TODO NOT YET IMPLEMENTED

    @property
    def fields_allowed_to_be_updated(self) -> List[str]:
        """
        Define the fields that are allowed to be updated for a `RunModel` object.

        Returns:
            A list of fields that are allowed to be updated in this class.
        """
        return ["parent", "child", "run_status", "additional_data", "workers"]

fields_allowed_to_be_updated property

Define the fields that are allowed to be updated for a RunModel object.

Returns:

Type Description
List[str]

A list of fields that are allowed to be updated in this class.

StudyModel dataclass

Bases: BaseDataModel

A dataclass to store all of the information for a study.

Attributes:

Name Type Description
additional_data Dict

For any extra data not explicitly defined.

fields_allowed_to_be_updated List[str]

A list of field names that are allowed to be updated.

id str

The unique ID for the study.

name str

The name of the study.

runs List[str]

A list of runs associated with this study.

Source code in merlin/db_scripts/data_models.py
@dataclass
class StudyModel(BaseDataModel):
    """
    A dataclass to store all of the information for a study.

    Attributes:
        additional_data (Dict): For any extra data not explicitly defined.
        fields_allowed_to_be_updated (List[str]): A list of field names that are
            allowed to be updated.
        id (str): The unique ID for the study.
        name (str): The name of the study.
        runs (List[str]): A list of runs associated with this study.
    """

    id: str = field(default_factory=lambda: str(uuid.uuid4()))  # pylint: disable=invalid-name
    name: str = None
    runs: List[str] = field(default_factory=list)

    @property
    def fields_allowed_to_be_updated(self) -> List[str]:
        """
        Define the fields that are allowed to be updated for a `StudyModel` object.

        Returns:
            A list of fields that are allowed to be updated in this class.
        """
        return ["runs"]

fields_allowed_to_be_updated property

Define the fields that are allowed to be updated for a StudyModel object.

Returns:

Type Description
List[str]

A list of fields that are allowed to be updated in this class.