Command line

The merlin executable defines a number of commands to create tasks, launch workers to run the tasks and remove tasks from the task server. The tasks are communicated to a task server, or broker, that are then requested by workers on an allocation to run. The celery python module is used to implement the tasks and worker functionality.

Help (merlin --help)

Descriptions of the Merlin commands are outputted when the -h or --help commands are used.

$ merlin [<command name>] --help

Version (merlin --version)

See the version by using the --version or -v flag.

$ merlin --version

Log Level (merlin -lvl debug)

More information, generally pertaining to bugs, can be output by increasing the logging level using the -lvl or --level argument.

Options for the level argument are: debug, info, warning, error.

$ merlin -lvl debug run <input.yaml>

Create the Config File (merlin config)

Create a default config file in the ${HOME}/.merlin directory using the config command. This file can then be edited for your system configuration.

$ merlin config [--task_server]  [--output_dir <dir>] [--broker <rabbitmq|redis>]

The --task_server option will select the appropriate configuration for the given task server. Currently only celery is implemented.

The --output_dir or -o will output the configuration in the given directory. This file can then be edited and copied into ${HOME}/.merlin.

The --broker command will write the initial app.yaml config file for a rabbitmq or redis broker. The default is rabbitmq. The backend will be redis in both cases. The redis backend in the rabbitmq config shows the use on encryption for the backend.

Generate working examples (merlin example)

If you want to run an example workflow, use Merlin’s merlin example:

$ merlin example list

This will list the available example workflows and a description for each one. To select one:

$ merlin example <example_name>

This will copy the example workflow to the current working directory. It is possible to specify another path to copy to.

$ merlin example <example_name> -p path/to/dir

If the specified directory does not exist Merlin will automatically create it.

This will generate the example workflow at the specified location, ready to be run.

Information (merlin info)

Information about your merlin and python configuration can be printed out by using the info command. This is helpful for debugging. Included in this command is a server check which will check for server connections. The connection check will timeout after 60 seconds.

$ merlin info

Monitor (merlin monitor)

Batch submission scripts may not keep the batch allocation alive if there is not a blocking process in the submission script. The merlin monitor command addresses this by providing a blocking process that checks for tasks in the queues every (sleep) seconds. When the queues are empty, the blocking process will exit and allow the allocation to end.

$ merlin monitor <input.yaml> [--steps <steps>] [--vars <VARIABLES=<VARIABLES>>] [--sleep <duration>][--task_server celery]

Use the --steps option to identify specific steps in the specification that you want to query.

The --vars option will specify desired Merlin variable values to override those found in the specification. The list is space-delimited and should be given after the input yaml file. Example: --vars LEARN=path/to/new_learn.py EPOCHS=3

The --sleep argument is the duration in seconds between checks for workers. The default is 60 seconds.

The only currently available option for --task_server is celery, which is the default when this flag is excluded.

The monitor function will check for celery workers for up to 10*(sleep) seconds before monitoring begins. The loop happens when the queue(s) in the spec contain tasks, but no running workers are detected. This is to protect against a failed worker launch.

Purging Tasks (merlin purge)

Once the merlin run command succeeds, the tasks are now on the task server waiting to be run by the workers. If you would like to remove the tasks from the server, then use the purge command.

Attention

Any tasks reserved by workers will not be purged from the queues. All workers must be first stopped so the tasks can be returned to the task server and then they can be purged.

You probably want to use merlin stop-workers first.

To purge all tasks in all queues defined by the workflow yaml file from the task server, run:

$ merlin purge <input.yaml> [-f] [--steps <steps>] [--vars <VARIABLES=<VARIABLES>>]

This will ask you if you would like to remove the tasks, you can use the -f option if you want to skip this.

If you have different queues in your workflow yaml file, you can choose which queues are purged by using the --steps argument and giving a space-delimited list of steps.

$ merlin purge <input.yaml> --steps step1 step2

The --vars option will specify desired Merlin variable values to override those found in the specification. The list is space-delimited and should be given after the input yaml file. Example: --vars QUEUE_NAME=new_queue EPOCHS=3

Searching for any workers (merlin query-workers)

If you want to see all workers that are currently connected to the task server you can use:

$ merlin query-workers

This will broadcast a command to all connected workers and print the names of any that respond and the queues they’re attached to. This is useful for interacting with workers, such as via merlin stop-workers --workers.

The --queues option will look for workers associated with the names of the queues you provide here. For example, if you want to see the names of all workers attached to the queues named demo and merlin you would use:

merlin query-workers --queues demo merlin

The --spec option will query for workers defined in the spec file you provide. For example, if simworker and nonsimworker are defined in a spec file called example_spec.yaml then to query for these workers you would use:

merlin query-workers --spec example_spec.yaml

The --workers option will query for workers based on the worker names you provide here. For example, if you wanted to query a worker named step_1_worker you would use:

merlin query-workers --workers step_1_worker

This flag can also take regular expressions as input. For instance, if you had several workers running but only wanted to find the workers whose names started with step you would use:

merlin query-workers --workers ^step

Restart the workflow (merlin restart)

To restart a previously started merlin workflow, use the restart command and the path to root of the merlin workspace that was generated during the previously run workflow. This will define the tasks and queue them on the task server also called the broker.

$ merlin restart [--local] <path/to/workspace_timestamp>

Merlin currently writes file called MERLIN_FINISHED to the directory of each step that was finished successfully. It uses this to determine which steps to skip during execution of a workflow.

The --local option will run tasks sequentially in your current shell.

Run the workflow (merlin run)

To run the merlin workflow use the run command and the path to the input yaml file <input.yaml>. This will define the tasks and queue them on the task server also called the broker.

$ merlin run [--local] <input.yaml> [--vars <VARIABLES=<VARIABLES>>] [--samplesfile <SAMPLES_FILE>] [--dry]

The --local option will run tasks sequentially in your current shell.

The --vars option will specify desired Merlin variable values to override those found in the specification. The list is space-delimited and should be given after the input yaml file. Example: --vars LEARN=path/to/new_learn.py EPOCHS=3

The --samplesfile will allow the user to specify a file containing samples. Valid choices: .npy, .csv, .tab. Should be given after the input yaml file.

The --no-errors option is used for testing, it will silence the errors thrown when flux is not present.

Dry Run

‘Dry run’ means telling workers to create a study’s workspace and all of its necessary subdirectories and scripts (with variables expanded) without actually executing the scripts.

To dry-run a workflow, use --dry:

$ merlin run --local --dry <input.yaml>

In a distributed fashion:

$ merlin run --dry <input.yaml> ; merlin run-workers <input.yaml>

You can also specify dry runs from the workflow specification file:

batch:
    dry_run: True

If you wish to execute a workflow after dry-running it, simply use restart.

Run the Workers (merlin run-workers)

The tasks queued on the broker are run by a collection of workers. These workers can be run local in the current shell or in parallel on a batch allocation. The workers are launched using the run-workers command which reads the configuration for the worker launch from the <input.yaml> file. The batch and merlin resources section are both used to configure the worker launch. The top level batch section can be overridden in the merlin workers resource section. Parallel workers should be scheduled using the system’s batch scheduler. Once the workers are running, tasks from the broker will be processed.

To launch workers for your workflow:

$ merlin run-workers [--echo]  <input.yaml> [--worker-args <worker args>] [--steps <WORKER_STEPS>] [--vars <VARIABLES=<VARIABLES>>]

The --echo option will echo the celery workers run command to stdout and not run any workers.

The --worker-args option will pass the values, in quotes, to the celery workers. Should be given after the input yaml file.

The --steps option is the specific steps in the input yaml file you want to run the corresponding workers. The default is ‘all’ steps. Should be given after the input yaml file.

The --vars option will specify desired Merlin variable values to override those found in the specification. The list is space-delimited and should be given after the input yaml file. Example: --vars LEARN=path/to/new_learn.py EPOCHS=3

An example of launching a simple celery worker using srun:

$ srun -n 1 celery -A merlin worker -l INFO

A parallel batch allocation launch is configured to run a single worker process per node. This worker process will then launch a number of worker threads to process the tasks. The number of threads can be configured by the users and will be the number of parallel jobs that can be run at once on the allocation plus threads for any non-parallel tasks. If there are 36 cores on a node and all the tasks are single core, the user may want to start 36 threads per node. If the parallel jobs uses 8 tasks, then the user should run 4 or 5 threads. For the celery workers the number of threads is set using the --concurrency argument, see the Configuring celery workers section.

A full SLURM batch submission script to run the workflow on 4 nodes is shown below.

#!/bin/bash
#SBATCH -N 4
#SBATCH -J Merlin
#SBATCH -t 30:00
#SBATCH -p pdebug
#SBATCH --mail-type=ALL
#SBATCH -o merlin_workers_%j.out

# Assumes you are run this in the same dir as the yaml file.
YAML_FILE=input.yaml

# Source the merlin virtualenv
source <path to merlin venv>/bin/activate

# Remove all tasks from the queues for this run.
#merlin purge -f ${YAML_FILE}

# Submit the tasks to the task server
merlin run  ${YAML_FILE}

# Print out the workers command
merlin run-workers  ${YAML_FILE} --echo

# Run the workers on the allocation
merlin run-workers  ${YAML_FILE}

# Delay until the workers cease running
merlin monitor

Status (merlin status)

$ merlin status <input.yaml> [--steps <steps>] [--vars <VARIABLES=<VARIABLES>>] [--csv <csv file>] [--task_server celery]

Use the --steps option to identify specific steps in the specification that you want to query.

The --vars option will specify desired Merlin variable values to override those found in the specification. The list is space-delimited and should be given after the input yaml file. Example: --vars LEARN=path/to/new_learn.py EPOCHS=3

The --csv option takes in a filename, to dump status reports to.

The only currently available option for --task_server is celery, which is the default when this flag is excluded.

Stopping workers (merlin stop-workers)

To send out a stop signal to some or all connected workers, use:

$ merlin stop-workers [--spec <input.yaml>] [--queues <queues>] [--workers <regex>] [--task_server celery]

The default behavior will send a stop to all connected workers across all workflows, having them shutdown softly.

The --spec option targets only workers named in the merlin block of the spec file.

The --queues option allows you to pass in the names of specific queues to stop. For example:

# Stop all workers on these queues, no matter their name
$ merlin stop-workers --queues queue1 queue2

The --workers option allows you to pass in regular expressions of names of workers to stop:

# Stop all workers whose name matches this pattern, no matter the queue
# Note the ".*" convention at the start, per regex
$ merlin stop-workers --workers ".*@my_other_host*"

The only currently available option for --task_server is celery, which is the default when this flag is excluded.

Attention

If you’ve named workers identically (you shouldn’t) only one might get the signal. In this case, you can send it again.

Hosting Local Server (merlin server)

To create a local server for merlin to connect to. Merlin server creates and configures a server on the current directory. This allows multiple instances of merlin server to exist for different studies or uses.

The init subcommand initalizes a new instance of merlin server.

The status subcommand checks to the status of the merlin server.

The start subcommand starts the merlin server.

The stop subcommand stops the merlin server.

The restart subcommand performs stop command followed by a start command on the merlin server.

The config subcommand edits configurations for the merlin server. There are multiple flags to allow for different configurations.

  • The -ip IPADDRESS, --ipaddress IPADDRESS option set the binded IP address for merlin server.

  • The -p PORT, --port PORT option set the binded port for merlin server.

  • The -pwd PASSWORD, --password PASSWORD option set the password file for merlin server.

  • The --add-user USER PASSWORD option add a new user for merlin server.

  • The --remove-user REMOVE_USER option remove an exisiting user from merlin server.

  • The -d DIRECTORY, --directory DIRECTORY option set the working directory for merlin server.

  • The -ss SNAPSHOT_SECONDS, --snapshot-seconds SNAPSHOT_SECONDS option set the number of seconds before each snapshot.

  • The -sc SNAPSHOT_CHANGES, --snapshot-changes SNAPSHOT_CHANGES option set the number of database changes before each snapshot.

  • The -sf SNAPSHOT_FILE, --snapshot-file SNAPSHOT_FILE option set the name of snapshots.

  • The -am APPEND_MODE, --append-mode APPEND_MODE option set the appendonly mode. Options are always, everysec, no.

  • The -af APPEND_FILE, --append-file APPEND_FILE option set the filename for server append/change file.

More information can be found on Merlin Server