Variables
There are a number of variables which can be placed in a Merlin spec file that can control workflow execution, such as via string expansion and control flow.
Note
Only user variables and OUTPUT_PATH
may be reassigned or overridden from the command line.
Token Syntax
Before we discuss what variables can be used with Merlin, let's first discuss the syntax for variables.
Merlin follows Maestro's minimalist token syntax for all variables. This includes Reserved Variables, User Variables, Step Return Variables, Parameters, and Samples. These variables are referenced in a spec using the $(TOKEN_NAME)
syntax.
User Variable Example
A user variable can be defined in the env
block of the spec as is discussed below. In this example, we're setting a variable MY_VARIABLE
to have the value 5
.
env:
variables:
MY_VARIABLE: 5
study:
- name: my_step
description: example showcasing token syntax
run:
cmd: echo "The value of my variable is $(MY_VARIABLE)"
If we ran this study, my_step
would produce a my_step.out
file containing the string "The value of my variable is 5".
Directory Structure Context
The directory structure of Merlin output looks like this:
SPECROOT
└── <spec.yaml>
...
OUTPUT_PATH
└── MERLIN_WORKSPACE
├── MERLIN_INFO
│ ├── <name>.orig.yaml
│ ├── <name>.partial.yaml
│ └── <name>.expanded.yaml
├── <step_name>.workspace
└── WORKSPACE
Reserved Variables
Reserved variables are study variables that Merlin uses. They may be referenced within a spec file but typically not reassigned or overridden. There are three exceptions to this rule: $(LAUNCHER)
, $(VLAUNCHER)
, and $(OUTPUT_PATH)
. All three of these variables can be modified.
Variable | Description | Example Expansion |
---|---|---|
|
Abstracts HPC scheduler specific job launching wrappers such as srun (Slurm). See below for more info. |
|
|
All of the directories in a simulation tree as a glob (*) string. |
|
|
Directory within MERLIN_WORKSPACE that holds the provenance specs and sample generation results. Commonly used to hold samples.npy . |
|
|
A space delimited string of all of the paths; can be used as is in bash for loop for instance with:
|
|
|
Sample index in an ensemble. |
|
|
Names of merlin sample values. |
|
|
Path in the sample directory tree to a sample's directory, i.e. where the task is actually run. |
|
|
Vector of merlin sample values. |
|
|
Archive version of MERLIN_SPEC_EXECUTED_RUN with all variables and paths fully resolved. |
|
|
Parsed and processed yaml file with command-line variable substitutions included. |
|
|
Copy of original yaml file passed to merlin run . |
|
|
The time a study began. May be used as a unique identifier. |
|
|
Output directory generated by a study at OUTPUT_PATH . Ends with MERLIN_TIMESTAMP . |
|
|
Directory path that the study output will be written to. If not defined this will default to the current working directory. This value May be reassigned or overridden. |
|
|
Directory path of the specification file. |
|
|
The same as $(LAUNCHER) but allows for shell variable substitution. See below for more info. |
|
|
The workspace directory for the current step. |
|
|
Can be used in a step to reference path to other previous step workspaces. Note
|
|
The LAUNCHER
and VLAUNCHER
Variables
$(LAUNCHER)
is a special case of a reserved variable since it's value can be changed. It serves as an abstraction to launch a job with parallel schedulers like Slurm, LSF, and Flux, and it can be used within a step command.
The arguments that the LAUNCHER
variable can use are:
Argument Name | Description |
---|---|
procs |
The total number of MPI tasks |
nodes |
The total number of MPI nodes |
walltime |
The total walltime of the run (hh:mm:ss or mm:ss or ss) (not available in LSF) |
cores per task |
The number of hardware threads per MPI task |
gpus per task |
The number of GPUs per MPI task |
LAUNCHER Example
Let's say we start with this run command inside our step:
study:
- name: LAUNCHER example
description: An example step showcasing the LAUNCHER variable
run:
cmd: srun -N 1 -n 3 python script.py
We can modify this to use the $(LAUNCHER)
variable like so:
batch:
type: slurm
study:
- name: LAUNCHER example
description: An example step showcasing the LAUNCHER variable
run:
cmd: $(LAUNCHER) python script.py
nodes: 1
procs: 3
In other words, the $(LAUNCHER)
variable here would be expanded to:
Similarly, the $(VLAUNCHER)
variable behaves almost the same way as the $(LAUNCHER)
variable. The key distinction lies in its source of information. Instead of drawing certain configuration options from the run
section of a step, it retrieves specific shell variables. These shell variables are automatically generated by Merlin when you include the $(VLAUNCHER)
variable in a step command, but they can also be customized by the user.
Currently, the following shell variables are:
Variable | Description | Default |
---|---|---|
${MERLIN_NODES} |
The number of nodes | 1 |
${MERLIN_PROCS} |
The number of tasks/procs | 1 |
${MERLIN_CORES} |
The number of cores per task/proc | 1 |
${MERLIN_GPUS} |
The number of gpus per task/proc | 0 |
VLAUNCHER Example
Let's say we have the following defined in our yaml file:
batch:
type: flux
study:
- name: VLAUNCHER example
description: An example step showcasing the VLAUNCHER variable
run:
cmd: |
MERLIN_NODES=4
MERLIN_PROCS=2
MERLIN_CORES=8
MERLIN_GPUS=2
$(VLAUNCHER) python script.py
The $(VLAUNCHER)
variable here would be expanded to:
User Variables
User variables are variables defined in the env
section of a spec file, as in this example:
As long as they're defined in order, you can nest user variables like so:
Like all other Merlin variables, user variables may be used anywhere (as a yaml key or value) within a specification as below:
study:
- name: VLAUNCHER example
description: An example step showcasing the VLAUNCHER variable
run:
cmd: echo "$(EXAMPLE_VAR), world!"
merlin:
resources:
workers:
$(WORKER_NAME):
args: -l INFO
steps: [all]
If you want to programmatically define the study name, you can include variables in the description.name
field as long as it makes a valid filename:
The above would produce a study called my_hello_study_42
.
Environment Variables
Merlin expands Unix environment variables for you. The values of the user variables below would be expanded:
However, Merlin leaves environment variables found in shell scripts (think cmd
and restart
) alone. So this step:
...would be expanded as:
Step Return Variables
When a Merlin step finishes executing, a return code is provided by Merlin behind the scenes. This return code is used to determine what to do upon step completion.
If necessary, users can raise their own return codes within steps. The table below lists all Merlin return codes and an example of how to raise each one.
Variable | Description | Example Usage |
---|---|---|
|
This step was successful. Keep going to the next task. Default step behavior if no exit code given. |
|
|
Run this step’s restart command, or re-run cmd if restart is absent. The default maximum number of retries+restarts for any given step is 30. You can override this by adding a max_retries field under the run field in the specification. Issues a warning. Default will retry in 1 second. To override the delay time, specify retry_delay . |
|
|
Retry this step's cmd command. The default maximum number of retries for any given step is 30. You can override this by adding a max_retries field under the run field in the specification. Issues a warning. Default will retry in 1 second. To override the delay time, specify retry_delay . |
|
|
Mark this step as a failure, note in the warning log but keep executing the workflow. Unknown return codes get translated to soft fails, so that they can be logged. |
|
|
Something went terribly wrong and we need to stop the whole workflow. Raises a HardFailException and stops all workers connected to that step. Workers will stop after a 60 second delay to allow the step to be acknowledged by the server. Note Workers in isolated parts of the workflow not consuming from the bad step will continue. you can stop all workers with |
|
|
Launch a task to stop all active workers. To allow the current task to finish and acknowledge the results to the server, will happen in 60 seconds. |
|
|
Purposefully raise a general exception. This is intended to be used for testing, you'll likely want to use $(MERLIN_SOFT_FAIL) instead. |
|