Merlin

Empower your projects with Merlin, a cloud-based workflow manager designed to facilitate scalable and reproducible workflows, particularly suited for running many simulations and iterative procedures.

On GitHub

Why Merlin?

Workflows, applications and machines are becoming more complex, but subject matter experts need to devote time and attention to their applications and often require fine command-line level control. Furthermore, they rarely have the time to devote to learning workflow systems.

With the expansion of data-driven computing, the HPC scientist needs to be able to run more simulations through complex multi-component workflows.

Merlin targets HPC workflows that require many simulations.¹

Goals and Motivations

Merlin was created with the intention of providing flexible and reproducible workflows to users at a scale that could be much larger than Maestro. Since Merlin is built as an extension of Maestro, we wanted to maintain Maestro's Goals and Motivations while at the same time providing users the ability to become their own big-data generator.

In the pursuit of flexible and reproducible worflows, Merlin places a paramount emphasis on workflow provenance. We recognize the importance of understanding how workflows evolve, ensuring that every decision, parameter adjustment, and execution is meticulously documented. Workflow provenance is not just a feature for us; it's a fundamental element that contributes to the reliability and trustworthiness of your studies.

Merlin understands the dynamic nature of your work, especially when dealing with large-scale simulations. Our goal is to provide a platform that seamlessly scales to accommodate the computational demands of extensive simulations, ensuring that your workflows remain efficient and effective, even in the face of substantial computational requirements.

Getting Started

Install Merlin

Merlin can be installed via pip in your own virtual environment.

First, create a virtual environment:
```
python -m venv merlin_venv
```

Now activate the virtual environment:

bashcsh

source merlin_venv/bin/activate

source merlin_venv/bin/activate.csh

Finally, install Merlin with pip:
```
pip install merlin
```

Create a Containerized Server

First, let's create a folder to store our server files and our examples. We'll also move into this directory:

mkdir merlin_examples ; cd merlin_examples/

Now let's set up a containerized server that Merlin can connect to.

Initialize the server files:
```
merlin server init
```
Start the server:
```
merlin server start
```
Copy the app.yaml configuration file from merlin_server/ to your current directory:
```
cp merlin_server/app.yaml .
```

Check that your server connection is working properly:

merlin info

Your broker and results server should both look like so:

Success

.
.
.
Checking server connections:
----------------------------
broker server connection: OK
results server connection: OK
.
.
.

Run an Example Workflow

Let's download Merlin's built-in "Hello, World!" example:

merlin example hello

Now that we've downloaded the example, enter the hello/ directory:

cd hello/

In this directory there are files named hello.yaml and hello_samples.yaml. These are what are known as Merlin specification (spec) files. The hello.yaml spec is a very basic example that will also work with Maestro. We'll focus on hello_samples.yaml here as it has more Merlin specific features:

description:  # (1)
    name: hello_samples
    description: a very simple merlin workflow, with samples

env:
    variables:  # (2)
        N_SAMPLES: 3

global.parameters:
    GREET:  # (3)
        values : ["hello","hola"]
        label  : GREET.%%

study:
    - name: step_1
      description: say hello
      run:  # (4)
          cmd: |
            echo "$(GREET), $(WORLD)!"

    - name: step_2
      description: print a success message
      run:  # (5)
          cmd: print("Hurrah, we did it!")
          depends: [step_1_*]  # (6)
          shell: /usr/bin/env python3

merlin:
    resources:
        workers:  # (7)
            demo_worker:
                args: -l INFO --concurrency=1
                steps: [all]
    samples:  # (8)
        generate:
            cmd: python3 $(SPECROOT)/make_samples.py --filepath=$(MERLIN_INFO)/samples.csv --number=$(N_SAMPLES)
        file: $(MERLIN_INFO)/samples.csv
        column_labels: [WORLD]

Mandatory name and description fields to encourage well documented workflows
Define single valued variable tokens for use in your workflow steps
Define parameter tokens of the form $(NAME) and lists of values to use in your steps such that Merlin can parameterize them for you
Here, cmd is a multline string written in bash to harness the robust existing ecosystem of tools users are already familiar with
Here, cmd is a single line string written in python. Merlin allows users to modify the shell that cmd uses to execute a step
Specify step dependencies using steps' name values to control execution order
Define custom workers to process your workflow in the most efficient manner
Generate samples to be used throughout your workflow. These can be used similar to parameters; use the $(SAMPLE_NAME) syntax (as can be seen in step_1)

We have two ways to run the hello_samples.yaml example:

In a Distributed MannerLocally

Send tasks to the broker:

merlin run hello_samples.yaml

Start the workers to execute the tasks:

merlin run-workers hello_samples.yaml

Execute the tasks locally without needing to comminucate with the containerized server we just established:

merlin run --local hello_samples.yaml

Running the workflow will first convert your steps into a task execution graph and then create a workspace directory with the results of running your study.

The directed acyclic graph (DAG) that's created for the hello_samples.yaml example will look like so:

If ran successfully, a workspace for your run should've been created with the name hello_samples_<timestamp>/. Below shows the expected contents of this workspace:

Contents of hello_samples_<timestamp>

Release

Merlin is released under an MIT license. For more information, please see the LICENSE.

LLNL-CODE-797170

See Enabling Machine Learning-Ready HPC Ensembles with Merlin for a paper that mentions a study with up to 40 million simulations. ↩