Querying Queues and Workers
Managing task queues and monitoring the associated workers is a common necessity in many applications. To facilitate these interactions, Merlin offers two essential commands – Queue Information and Query Workers.
This module will delve into the details of these commands, providing insights into how to effectively retrieve information about task queues and query workers.
Queue Information
Merlin provides users with the merlin queue-info
command to help monitor celery queues. This command will list queue statistics in a table format where the columns are as follows: queue name, number of tasks in the queue, number of workers connected to the queue.
The default functionality of this command is to display queue statistics for active queues. Active queues are any queues that have a worker watching them.
Usage:
Basic Options
The queue-info
command comes equipped with some basic options:
--dump
: Dump the queue information to a.csv
or.json
file--specific-queues
: Only obtain queue information for queues you list here--task-server
: Modify the task server value
Dump
Much like the two status commands, the queue-info
command provides a way to dump the queue statistics to an output file.
When dumping to a file that does not yet exist, Merlin will create that file for you and populate it with the queue statistics you requested.
When dumping to a file that does exist, Merlin will append the requested queue statistics to that file. You can differentiate between separate dump calls by looking at the timestamps of the dumps. For CSV files this timestamp exists in the Time
column (see CSV Dump Format below) and for JSON files this timestamp will be the top level key to the queue info entry (see JSON Dump Format below).
Using any of the --specific-steps
, --spec
, or --steps
options will modify the output that's written to the output file.
CSV Dump Format
The format of a CSV dump file for queue information is as follows:
The <queue_name>:tasks
and <queue_name>:consumers
columns will be created for each queue that's listed in the queue-info output at the time of your dump.
The image below shows an example of dumping the queue statistics of active queues to a csv file, and then displaying that csv file using the rich-cli library:
JSON Dump Format
The format of a JSON dump file for queue information is as follows:
{
"YYYY-MM-DD HH:MM:SS": {
"[merlin]_queue_name": {
"tasks": <integer>,
"consumers": <integer>
}
}
}
The image below shows an example of dumping the queue info to a json file, and then displaying that json file using the rich-cli library:
Specific Queues
If you know exactly what queues you want to check on, you can use the --specific-queues
option to list one or more queues to view.
Usage:
Example Queue-Info Output Using Specific-Queues With Active Queues
In the example below, we're querying the train
and predict
queues which both have a worker watching them currently (in other words, they are active).
If you ask for queue-info of inactive queues with the --specific-queues
option, a table format will still be output for you.
Example Queue-Info Output Using Specific-Queues With Inactive Queues
In the example below, we're querying the train
and predict
queues which both don't have a worker watching them currently (in other words, they are inactive).
Task Server
To modify the task server from the command line you can use the --task-server
option. However, the only currently available option for task server is celery so you most likely will not want to use this option.
Usage:
Specification Options
There are three options with the --queue-info
command that revolve around using a spec file to query queue information:
--spec
: Obtain queue information for all queues defined in a spec file--steps
: Obtain queue information for queues attached to certain steps in a spec file--vars
: Modify environment variables in a spec from the command line
Note
The --steps
and --vars
options must be used alongside the --spec
option. They cannot be used by themselves.
Spec
Using the --spec
option allows you to query queue statistics for queues that only exist in the spec file you provide. This is the same functionality as the merlin status
command prior to the release of Merlin v1.12.0.
Usage:
Example Queue-Info Output Using the --spec
Option
The below example will display queue information for all queues in the hello.yaml
spec file.
Steps
Warning
This option must be used alongside the --spec
option.
If you'd like to see queue information for queues that are attached to specific steps in your workflow, use the --steps
option.
Usage:
Example Queue-Info Output Using the --steps
Option
Say we have a spec file with steps named step_1
through step_4
and each step is attached to a different task queue step_1_queue
through step_4_queue
respectively. Using the --steps
option for these two steps gives us:
Vars
Warning
This option must be used alongside the --spec
option.
The --vars
option can be used to modify any variables defined in your spec file from the command line interface. This option can take a space-delimited list of variables and their assignments, and should be given after the input yaml file.
Usage:
Example of Using --vars
With Queue-Info
Say we have the following spec file with a variable QUEUE_NAME
that's referenced in step_2
:
description:
name: vars_demo
description: a very simple merlin workflow
env:
variables:
QUEUE_NAME: step_2_queue
study:
- name: step_1
description: say hello
run:
cmd: echo "hello!"
task_queue: step_1_queue
- name: step_2
description: print a success message
run:
cmd: print("Hurrah, we did it!")
depends: [step_1_*]
shell: /usr/bin/env python3
task_queue: $(QUEUE_NAME)
If we decided we wanted to modify this queue at the command line, we could accomplish this with the --vars
option of the merlin run
and merlin run-workers
commands:
Now if we were to try to query the queue information without using the same --vars
argument:
...we see step_1_queue
and step_2_queue
but we wouldn't see new_queue_name
:
This is due to the fact that when we modify a variable from the command line, the original spec file is not changed.
With that being said, let's now run this again but this time we'll use the --vars
option:
...which should show us a worker watching the new_queue_name
queue:
Query Workers
Merlin provides users with the merlin query-workers
command to help users see which workers are running and what task queues they're watching.
This command will output content to a table format with two columns: workers and queues. The workers column will contain one connected worker per row. The queues column will contain a comma-delimited list of queues that the connected worker is watching.
Usage:
Example Query-Workers Output With No Connected Workers
Example Query-Workers Output With Connected Workers
Query Workers by Spec File
When utilizing the --spec
option with the query-workers
command, your query will be adjusted to exclusively search for the workers specified in the provided spec file. If none of the workers defined in the spec file have been started, a message indicating this will be displayed.
Usage:
Example of Using the --spec
Option With Query-Workers
For this example let's start with two spec files demo_workflow.yaml
and demo_workflow_2.yaml
. In demo_workflow.yaml
we'll have two workers trainer
and predictor
, and in demo_workflow_2.yaml
we'll have one worker worker1
.
Here, the trainer
worker is watching the create_data
and train
steps which both have task queues of the same name. Therefore, the trainer
worker will be watching the create_data
and train
task queues. Similarly, the predictor
worker will be watching the predict
and verify
task queues.
description:
name: demo_workflow
description: a very simple merlin workflow
study:
- name: create_data
description: create data for our model to train with
run:
cmd: echo "creating data"
task_queue: create_data
- name: train
description: train a model
run:
cmd: echo "training a model on the data"
task_queue: train
- name: predict
description: predict on unseen data
run:
cmd: echo "predict on new, unseen data"
task_queue: predict
- name: verify
description: verify the validity of our study
run:
cmd: echo "verify our study succeeded"
task_queue: verify
merlin:
resources:
workers:
trainer:
args: -l INFO
steps: [create_data, train]
predictor:
args: -l INFO
steps: [predict, verify]
In this workflow, the worker1
worker is assigned to both step_1
and step_2
. Therefore, this worker will be connected to both step_1_queue
and step_2_queue
.
description:
name: $(NAME)
description: a very simple merlin workflow
env:
variables:
NAME: hello
OUTPUT_PATH: ./studies
global.parameters:
GREET:
values : ["hello","hola"]
label : GREET.%%
WORLD:
values : ["world","mundo"]
label : WORLD.%%
study:
- name: step_1
description: say hello
run:
cmd: echo "$(GREET), $(WORLD)!"
task_queue: step_1_queue
- name: step_2
description: print a success message
run:
cmd: print("Hurrah, we did it!")
depends: [step_1_*]
shell: /usr/bin/env python3
task_queue: step_2_queue
merlin:
resources:
workers:
worker1:
args: -l INFO
steps: [all]
Let's start these workers with:
Now if we used the query-workers
command without the --spec
option, we'd see all three workers across both workflows: trainer
, predictor
, and worker1
:
Great, but what if we wanted to see just the workers for demo_workflow.yaml
? We can accomplish this by using the --spec
option:
Now, we'll notice that the only workers being displayed are trainer
and predictor
:
Query Workers by Queues
In Merlin, newly spawned workers are linked to task queues either as assigned by you or automatically designated if not specified. Utilizing the --queues
option in the query-workers
command enables you to query workers based on the queues to which they are connected.
Usage:
Example of Using the --queues
Option With Query-Workers
Say we have the below spec file with four workers creator
, trainer
, predictor
, and verifier
that are each attached to their respective steps/task queues. In other words, creator
will be connected to the create_data
task queue, trainer
will be connected to the train
task queue, etc.:
description:
name: demo_workflow
description: a very simple merlin workflow
study:
- name: create_data
description: create data for our model to train with
run:
cmd: echo "creating data"
task_queue: create_data
- name: train
description: train a model
run:
cmd: echo "training a model on the data"
task_queue: train
- name: predict
description: predict on unseen data
run:
cmd: echo "predict on new, unseen data"
task_queue: predict
- name: verify
description: verify the validity of our study
run:
cmd: echo "verify our study succeeded"
task_queue: verify
merlin:
resources:
workers:
creator:
args: -l INFO
steps: [create_data]
trainer:
args: -l INFO
steps: [train]
predictor:
args: -l INFO
steps: [predict]
verifier:
args: -l INFO
steps: [verify]
We can start these workers with:
Now if we query the workers without the --queues
option, we'll see all four workers alive and connected to their respective queues:
Let's refine this query to just view the workers connected to the train
and predict
queues:
As we can see in the output below, only the trainer
and predictor
workers are now displayed:
Query Workers by Worker Regex
There will be instances when you know precisely which workers you want to query. In such cases, the --workers
option in the query-workers
command proves useful. This option facilitates querying workers using regular expressions. As full strings are accepted as regular expressions, you can also query workers by worker name.
Usage:
Example of Using the --workers
Option With Query-Workers
Say we have the following spec file with 3 workers step_1_worker
, step_2_worker
, and other_worker
:
description:
name: demo_workflow
description: a very simple merlin workflow
study:
- name: step_1
description: A step following the `step_*` name pattern
run:
cmd: echo "step 1"
task_queue: step_1_queue
- name: step_2
description: A step following the `step_*` name pattern
run:
cmd: echo "step 2"
task_queue: step_2_queue
- name: other_step
description: A step with a different name
run:
cmd: echo "other step"
task_queue: predict
merlin:
resources:
workers:
step_1_worker:
args: -l INFO
steps: [step_1]
step_2_worker:
args: -l INFO
steps: [step_2]
other_worker:
args: -l INFO
steps: [other_step]
Once the workers are started for this workflow, we can then query for our step_1_worker
and step_2_worker
with:
In our output we see that both workers that we asked for were queried but other_worker
was ignored:
Alternatively, we can do the exact same query using a regular expression:
The ^
operator for regular expressions will match the beginning of a string. In this example when we say ^step
we're asking Merlin to match any worker starting with the word step
, which in this case is step_1_worker
and step_2_worker
. We can see this in the output below: