Variables¶
There are a number of variables which can be placed in a merlin input .yaml file that can control workflow execution, such as via string expansion and control flow.
Note
Only user variables and OUTPUT_PATH
may be reassigned or overridden from the command line.
Directory structure context¶
The directory structure of merlin output looks like this:
SPECROOT
<spec.yaml>
...
OUTPUT_PATH
MERLIN_WORKSPACE
MERLIN_INFO
<name>.orig.yaml
<name>.partial.yaml
<name>.expanded.yaml
<step_name>.workspace
WORKSPACE
Reserved variables¶
Variable |
Description |
Example Expansion |
---|---|---|
|
Directory path of the specification file. |
/globalfs/user/merlin_workflows
|
|
Directory path the study output will be written to. If not defined will default to the current working directory. May be reassigned or overridden. |
./studies
|
|
The time a study began. May be used as a unique identifier. |
"YYYYMMDD-HHMMSS"
|
|
Output directory generated by a study at |
$(OUTPUT_PATH)/ensemble_name_$(MERLIN_TIMESTAMP)
|
|
The workspace directory for a single step. |
$(OUTPUT_PATH)/ensemble_name_$(MERLIN_TIMESTAMP)/step_name/``
|
|
Directory within |
$(MERLIN_WORKSPACE)/merlin_info/
|
|
Sample index in an ensemble |
0 1 2 3
|
|
Path in the sample directory tree to a sample’s directory, i.e. where the task is actually run. |
/0/0/0/ /0/0/1/ /0/0/2/ /0/0/3/
|
|
All of the directories in a simulation tree as a glob (*) string |
/*/*/*/*
|
|
A space delimited string of all of the paths; can be used as is in bash for loop for instance with: for path in $(MERLIN_PATHS_ALL)
do
ls $path
done
|
0/0/0
0/0/1
0/0/2
0/0/3
|
|
Vector of merlin sample values |
$(SAMPLE_COLUMN_1) $(SAMPLE_COLUMN_2) ...
|
|
Names of merlin sample values |
SAMPLE_COLUMN_1 SAMPLE_COLUMN_2 ...
|
|
Copy of original yaml file passed to |
$(MERLIN_INFO)/*.orig.yaml
|
|
Parsed and processed yaml file with command-line variable substitutions included. |
$(MERLIN_INFO)/*.partial.yaml
|
|
Archive version of |
$(MERLIN_INFO)/*.expanded.yaml
|
The LAUNCHER
and VLAUNCHER
Variables¶
$(LAUNCHER)
is a special case of a reserved variable since it’s value can be changed.
It serves as an abstraction to launch a job with parallel schedulers like slurm,
lsf, and flux and it can be used within a step command. For example,
say we start with this run cmd inside our step:
run:
cmd: srun -N 1 -n 3 python script.py
We can modify this to use the $(LAUNCHER)
variable like so:
batch:
type: slurm
run:
cmd: $(LAUNCHER) python script.py
nodes: 1
procs: 3
In other words, the $(LAUNCHER)
variable would become srun -N 1 -n 3
.
Similarly, the $(VLAUNCHER)
variable behaves similarly to the $(LAUNCHER)
variable.
The key distinction lies in its source of information. Instead of drawing certain configuration
options from the run
section of a step, it retrieves specific shell variables. These shell
variables are automatically generated by Merlin when you include the $(VLAUNCHER)
variable
in a step command, but they can also be customized by the user. Currently, the following shell
variables are:
Variable |
Description |
Default |
---|---|---|
|
The number of nodes |
1 |
|
The number of tasks/procs |
1 |
|
The number of cores per task/proc |
1 |
|
The number of gpus per task/proc |
0 |
Let’s say we have the following defined in our yaml file:
batch:
type: flux
run:
cmd: |
MERLIN_NODES=4
MERLIN_PROCS=2
MERLIN_CORES=8
MERLIN_GPUS=2
$(VLAUNCHER) python script.py
The $(VLAUNCHER)
variable would be substituted to flux run -N 4 -n 2 -c 8 -g 2
.
User variables¶
Variables defined by a specification file in the env
section, as in this example:
env:
variables:
ID: 42
EXAMPLE_VAR: hello
As long as they’re defined in order, you can nest user variables like this:
env:
variables:
EXAMPLE_VAR: hello
WORKER_NAME: $(EXAMPLE_VAR)_worker
Like all other Merlin variables, user variables may be used anywhere (as a yaml key or value) within a specification as below:
cmd: echo "$(EXAMPLE_VAR), world!"
...
$(WORKER_NAME):
args: ...
If you want to programmatically define the study name, you can include variables
in the description.name
field as long as it makes a valid filename:
description:
name: my_$(EXAMPLE_VAR)_study_$(ID)
description: example of programmatic study name
The above would produce a study called my_hello_study_42
.
Environment variables¶
Merlin expands Unix environment variables for you. The values of the user variables below would be expanded:
env:
variables:
MY_HOME: ~/
MY_PATH: $PATH
USERNAME: ${USER}
However, Merlin leaves environment variables found in shell scripts (think cmd
and restart
) alone.
So this step:
- name: step1
description: an example
run:
cmd: echo $PATH ; echo $(MY_PATH)
…would be expanded as:
- name: step1
description: an example
run:
cmd: echo $PATH ; echo /an/example/:/path/string/
Step return variables¶
Variable |
Description |
Example Usage |
---|---|---|
|
This step was successful. Keep going to the next task. Default step behavior if no exit code given. |
echo "hello, world!"
exit $(MERLIN_SUCCESS)
|
|
Run this step’s |
run:
cmd: |
touch my_file.txt
echo "hi mom!" >> my_file.txt
exit $(MERLIN_RESTART)
restart: |
echo "bye, mom!" >> my_file.txt
max_retries: 23
retry_delay: 10
|
|
Retry this step’s |
run:
cmd: |
touch my_file.txt
echo "hi mom!" >> my_file.txt
exit $(MERLIN_RETRY)
max_retries: 23
retry_delay: 10
|
|
Mark this step as a failure, note in the warning log but keep going. Unknown return codes get translated to soft fails, so that they can be logged. |
echo "Uh-oh, this sample didn't work"
exit $(MERLIN_SOFT_FAIL)
|
|
Something went terribly wrong and I need to stop the whole workflow.
Raises a Note Workers in isolated parts of the
workflow not consuming from the bad step will continue. You can stop
all workers with |
echo "Oh no, we've created skynet! Abort!"
exit $(MERLIN_HARD_FAIL)
|
|
Launch a task to stop all active workers. To allow the current task to finish and acknowledge the results to the server, will happen in 60 seconds. |
# send a signal to all workers to stop
exit $(MERLIN_STOP_WORKERS)
|