Skip to content

Variables

There are a number of variables which can be placed in a Merlin spec file that can control workflow execution, such as via string expansion and control flow.

Note

Only user variables and OUTPUT_PATH may be reassigned or overridden from the command line.

Token Syntax

Before we discuss what variables can be used with Merlin, let's first discuss the syntax for variables.

Merlin follows Maestro's minimalist token syntax for all variables. This includes Reserved Variables, User Variables, Step Return Variables, Parameters, and Samples. These variables are referenced in a spec using the $(TOKEN_NAME) syntax.

User Variable Example

A user variable can be defined in the env block of the spec as is discussed below. In this example, we're setting a variable MY_VARIABLE to have the value 5.

env:
    variables:
        MY_VARIABLE: 5

study:
    - name: my_step
      description: example showcasing token syntax
      run:
        cmd: echo "The value of my variable is $(MY_VARIABLE)"

If we ran this study, my_step would produce a my_step.out file containing the string "The value of my variable is 5".

Directory Structure Context

The directory structure of Merlin output looks like this:

SPECROOT
└── <spec.yaml>

...

OUTPUT_PATH
└── MERLIN_WORKSPACE
    ├── MERLIN_INFO
    │   ├── <name>.orig.yaml
    │   ├── <name>.partial.yaml
    │   └── <name>.expanded.yaml
    ├── <step_name>.workspace
    └── WORKSPACE

Reserved Variables

Reserved variables are study variables that Merlin uses. They may be referenced within a spec file but typically not reassigned or overridden. There are three exceptions to this rule: $(LAUNCHER), $(VLAUNCHER), and $(OUTPUT_PATH). All three of these variables can be modified.

Variable Description Example Expansion
$(LAUNCHER)
Abstracts HPC scheduler specific job launching wrappers such as srun (Slurm). See below for more info.
srun -N 1 -n 3
$(MERLIN_GLOB_PATH)
All of the directories in a simulation tree as a glob (*) string.
/*/*/*/*
$(MERLIN_INFO)
Directory within MERLIN_WORKSPACE that holds the provenance specs and sample generation results. Commonly used to hold samples.npy.
$(MERLIN_WORKSPACE)/merlin_info/
$(MERLIN_PATHS_ALL)
A space delimited string of all of the paths; can be used as is in bash for loop for instance with:
for path in $(MERLIN_PATHS_ALL)
do
ls $path
done
0/0/0
0/0/1
0/0/2
0/0/3
$(MERLIN_SAMPLE_ID)
Sample index in an ensemble.
0 1 2 3
$(MERLIN_SAMPLE_NAMES)
Names of merlin sample values.
SAMPLE_COLUMN_1 SAMPLE_COLUMN_2 ...
$(MERLIN_SAMPLE_PATH)
Path in the sample directory tree to a sample's directory, i.e. where the task is actually run.
/0/0/0/ /0/0/1/ /0/0/2/ /0/0/3/
$(MERLIN_SAMPLE_VECTOR)
Vector of merlin sample values.
$(SAMPLE_COLUMN_1) $(SAMPLE_COLUMN_2) ...
$(MERLIN_SPEC_ARCHIVED_COPY)
Archive version of MERLIN_SPEC_EXECUTED_RUN with all variables and paths fully resolved.
$(MERLIN_INFO)/*.expanded.yaml
$(MERLIN_SPEC_EXECUTED_RUN)
Parsed and processed yaml file with command-line variable substitutions included.
$(MERLIN_INFO)/*.partial.yaml
$(MERLIN_SPEC_ORIGINAL_TEMPLATE)
Copy of original yaml file passed to merlin run.
$(MERLIN_INFO)/*.orig.yaml
$(MERLIN_TIMESTAMP)
The time a study began. May be used as a unique identifier.
"YYYYMMDD-HHMMSS"
$(MERLIN_WORKSPACE)
Output directory generated by a study at OUTPUT_PATH. Ends with MERLIN_TIMESTAMP.
$(OUTPUT_PATH)/ensemble_name_$(MERLIN_TIMESTAMP)
$(OUTPUT_PATH)
Directory path that the study output will be written to. If not defined this will default to the current working directory. This value May be reassigned or overridden.
./studies
$(SPECROOT)
Directory path of the specification file.
/globalfs/user/merlin_workflows
$(VLAUNCHER)
The same as $(LAUNCHER) but allows for shell variable substitution. See below for more info.
srun -N 1 -n 3
$(WORKSPACE)
The workspace directory for the current step.
$(OUTPUT_PATH)/ensemble_name_$(MERLIN_TIMESTAMP)/current_step_name/
$(<step_name>.workspace)
Can be used in a step to reference path to other previous step workspaces.

Note

step_name is the name key in each study step.

$(OUTPUT_PATH)/ensemble_name_$(MERLIN_TIMESTAMP)/step_name/

The LAUNCHER and VLAUNCHER Variables

$(LAUNCHER) is a special case of a reserved variable since it's value can be changed. It serves as an abstraction to launch a job with parallel schedulers like Slurm, LSF, and Flux, and it can be used within a step command.

The arguments that the LAUNCHER variable can use are:

Argument Name Description
procs The total number of MPI tasks
nodes The total number of MPI nodes
walltime The total walltime of the run (hh:mm:ss or mm:ss or ss) (not available in LSF)
cores per task The number of hardware threads per MPI task
gpus per task The number of GPUs per MPI task

LAUNCHER Example

Let's say we start with this run command inside our step:

study:
    - name: LAUNCHER example
      description: An example step showcasing the LAUNCHER variable
      run:
        cmd: srun -N 1 -n 3 python script.py

We can modify this to use the $(LAUNCHER) variable like so:

batch:
    type: slurm

study:
    - name: LAUNCHER example
      description: An example step showcasing the LAUNCHER variable
      run:
        cmd: $(LAUNCHER) python script.py
        nodes: 1
        procs: 3

In other words, the $(LAUNCHER) variable here would be expanded to:

srun -N 1 -n 3

Similarly, the $(VLAUNCHER) variable behaves almost the same way as the $(LAUNCHER) variable. The key distinction lies in its source of information. Instead of drawing certain configuration options from the run section of a step, it retrieves specific shell variables. These shell variables are automatically generated by Merlin when you include the $(VLAUNCHER) variable in a step command, but they can also be customized by the user.

Currently, the following shell variables are:

Variable Description Default
${MERLIN_NODES} The number of nodes 1
${MERLIN_PROCS} The number of tasks/procs 1
${MERLIN_CORES} The number of cores per task/proc 1
${MERLIN_GPUS} The number of gpus per task/proc 0

VLAUNCHER Example

Let's say we have the following defined in our yaml file:

batch:
    type: flux

study:
    - name: VLAUNCHER example
      description: An example step showcasing the VLAUNCHER variable
      run:
        cmd: |
            MERLIN_NODES=4
            MERLIN_PROCS=2
            MERLIN_CORES=8
            MERLIN_GPUS=2
            $(VLAUNCHER) python script.py

The $(VLAUNCHER) variable here would be expanded to:

flux run -N 4 -n 2 -c 8 -g 2

User Variables

User variables are variables defined in the env section of a spec file, as in this example:

env:
    variables:
        ID: 42
        EXAMPLE_VAR: hello

As long as they're defined in order, you can nest user variables like so:

env:
    variables:
        EXAMPLE_VAR: hello
        WORKER_NAME: $(EXAMPLE_VAR)_worker

Like all other Merlin variables, user variables may be used anywhere (as a yaml key or value) within a specification as below:

study:
    - name: VLAUNCHER example
      description: An example step showcasing the VLAUNCHER variable
      run:
        cmd: echo "$(EXAMPLE_VAR), world!"

merlin:
    resources:
        workers:
            $(WORKER_NAME):
                args: -l INFO
                steps: [all]

If you want to programmatically define the study name, you can include variables in the description.name field as long as it makes a valid filename:

description:
    name: my_$(EXAMPLE_VAR)_study_$(ID)
    description: example of programmatic study name

The above would produce a study called my_hello_study_42.

Environment Variables

Merlin expands Unix environment variables for you. The values of the user variables below would be expanded:

env:
    variables:
        MY_HOME: ~/
        MY_PATH: $PATH
        USERNAME: ${USER}

However, Merlin leaves environment variables found in shell scripts (think cmd and restart) alone. So this step:

study:
    - name: step1
      description: an example
      run:
        cmd: echo $PATH ; echo $(MY_PATH)

...would be expanded as:

study:
    - name: step1
      description: an example
      run:
        cmd: echo $PATH ; echo /an/example/:/path/string/

Step Return Variables

When a Merlin step finishes executing, a return code is provided by Merlin behind the scenes. This return code is used to determine what to do upon step completion.

If necessary, users can raise their own return codes within steps. The table below lists all Merlin return codes and an example of how to raise each one.

Variable Description Example Usage
$(MERLIN_SUCCESS)
This step was successful. Keep going to the next task. Default step behavior if no exit code given.
echo "hello, world!"
exit $(MERLIN_SUCCESS)
$(MERLIN_RESTART)
Run this step’s restart command, or re-run cmd if restart is absent. The default maximum number of retries+restarts for any given step is 30. You can override this by adding a max_retries field under the run field in the specification. Issues a warning. Default will retry in 1 second. To override the delay time, specify retry_delay.
run:
cmd: |
touch my_file.txt
echo "hi mom!" >> my_file.txt
exit $(MERLIN_RESTART)
restart: |
echo "bye, mom!" >> my_file.txt
max_retries: 23
retry_delay: 10
$(MERLIN_RETRY)
Retry this step's cmd command. The default maximum number of retries for any given step is 30. You can override this by adding a max_retries field under the run field in the specification. Issues a warning. Default will retry in 1 second. To override the delay time, specify retry_delay.
run:
cmd: |
touch my_file.txt
echo "hi mom!" >> my_file.txt
exit $(MERLIN_RETRY)
max_retries: 23
retry_delay: 10
$(MERLIN_SOFT_FAIL)
Mark this step as a failure, note in the warning log but keep executing the workflow. Unknown return codes get translated to soft fails, so that they can be logged.
echo "Uh-oh, this sample didn't work"
exit $(MERLIN_SOFT_FAIL)
$(MERLIN_HARD_FAIL)
Something went terribly wrong and we need to stop the whole workflow. Raises a HardFailException and stops all workers connected to that step. Workers will stop after a 60 second delay to allow the step to be acknowledged by the server.

Note

Workers in isolated parts of the workflow not consuming from the bad step will continue. you can stop all workers with $(MERLIN_STOP_WORKERS)

echo "Oh no, we've created skynet! Abort!"
exit $(MERLIN_HARD_FAIL)
$(MERLIN_STOP_WORKERS)
Launch a task to stop all active workers. To allow the current task to finish and acknowledge the results to the server, will happen in 60 seconds.
# send a signal to all workers to stop
exit $(MERLIN_STOP_WORKERS)