dag
Holds the Merlin Directed Acyclic Graph (DAG) class.
DAG
This class provides methods on a task graph that Merlin needs for staging
tasks in Celery. It is initialized from a Maestro ExecutionGraph, and the
major entry point is the group_tasks method, which provides groups of
independent chains of tasks.
Attributes:
| Name | Type | Description |
|---|---|---|
backwards_adjacency |
Dict
|
A dictionary mapping each task to its parent tasks for reverse traversal. |
column_labels |
List[str]
|
A list of column labels provided in the spec file. |
maestro_adjacency_table |
OrderedDict
|
An ordered dict showing adjacency of nodes. Comes from
a maestrowf |
maestro_values |
OrderedDict
|
An ordered dict of the values at each node. Comes from a maestrowf
|
parameter_info |
Dict
|
A dict containing information about parameters in the study. |
study_name |
str
|
The name of the study. |
Methods:
| Name | Description |
|---|---|
calc_backwards_adjacency |
Initializes the backwards adjacency table. |
calc_depth |
Calculate the depth of the given node and its children. |
children |
Return the children of the task. |
compatible_merlin_expansion |
Check if two tasks are compatible for Merlin expansion. |
find_chain |
Find the chain containing the task. |
find_independent_chains |
Finds independent chains and adjusts with the groups of chains to maximize parallelism. |
group_by_depth |
Group Directed Acyclic Graph (DAG) tasks by depth. |
group_tasks |
Group independent tasks in a DAG. |
num_children |
Find the number of children for the given task in the DAG. |
num_parents |
Find the number of parents for the given task in the DAG. |
parents |
Return the parents of the task. |
step |
Return a |
Source code in merlin/study/dag.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 | |
__init__(maestro_adjacency_table, maestro_values, column_labels, study_name, parameter_info)
Initializes a Directed Acyclic Graph (DAG) object, which represents a task graph used by Merlin
for staging tasks in Celery. The DAG is initialized from a Maestro ExecutionGraph by unpacking
its adjacency table and node values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
maestro_adjacency_table
|
OrderedDict
|
An ordered dictionary representing the adjacency
relationships between tasks in the graph. This comes from a Maestro |
required |
maestro_values
|
OrderedDict
|
An ordered dictionary containing the values or metadata
associated with each task in the graph. This also comes from a Maestro |
required |
column_labels
|
List[str]
|
A list of column labels provided in the specification file, typically used to identify parameters or task attributes. |
required |
study_name
|
str
|
The name of the study to which this DAG belongs. |
required |
parameter_info
|
Dict
|
A dictionary containing information about the parameters in the study, such as their names and values. |
required |
Source code in merlin/study/dag.py
calc_backwards_adjacency()
Initializes the backwards adjacency table.
This method constructs a mapping of each task to its parent tasks in the Directed Acyclic Graph (DAG). The backwards adjacency table allows for reverse traversal of the graph, enabling the identification of dependencies for each task.
The method iterates through each parent task in the maestro_adjacency_table
and updates the backwards_adjacency dictionary. For each task that is a child
of a parent, it adds the parent to the list of that task's parents in the
backwards_adjacency table.
This is essential for operations that require knowledge of a task's dependencies, such as determining the order of execution or identifying independent tasks.
Example
If the maestro_adjacency_table is structured as follows:
After calling this method, the backwards_adjacency will be:
Source code in merlin/study/dag.py
calc_depth(node, depths, current_depth=0)
Calculate the depth of the given node and its children. This recursive
method will update depths in place.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node
|
str
|
The node to start at. |
required |
depths
|
Dict
|
The dictionary of depths to update. |
required |
current_depth
|
int
|
The current depth in the graph traversal. |
0
|
Source code in merlin/study/dag.py
children(task_name)
Return the children of the task.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_name
|
str
|
The name of the task to get the children of. |
required |
Returns:
| Type | Description |
|---|---|
List
|
List of children of this task. |
Source code in merlin/study/dag.py
compatible_merlin_expansion(task1, task2)
Check if two tasks are compatible for Merlin expansion.
This method compares the expansion needs of two tasks to determine if they can be expanded together.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task1
|
str
|
The first task. |
required |
task2
|
str
|
The second task. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if compatible, False otherwise. |
Source code in merlin/study/dag.py
find_chain(task_name, list_of_groups_of_chains)
staticmethod
Find the chain containing the task.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_name
|
str
|
The task to search for. |
required |
list_of_groups_of_chains
|
List[List[List]]
|
List of groups of chains to search for the task. |
required |
Returns:
| Type | Description |
|---|---|
List
|
The list representing the chain containing task_name, or None if not found. |
Source code in merlin/study/dag.py
find_independent_chains(list_of_groups_of_chains)
Finds independent chains and adjusts with the groups of chains to maximize parallelism.
This method looks for opportunities to move tasks in deeper groups of chains into chains in shallower groups, thus increasing available parallelism in execution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
list_of_groups_of_chains
|
List[List[List]]
|
List of list of lists, as returned by
|
required |
Returns:
| Type | Description |
|---|---|
List[List[List]]
|
Adjusted list of groups of chains to maximize parallelism. |
Example
Given input chains, the method may return a modified structure that allows for more tasks to be executed in parallel. For example, we might start with this:
and finish with this:
Source code in merlin/study/dag.py
group_by_depth(depths)
staticmethod
Group Directed Acyclic Graph (DAG) tasks by depth.
This method only groups by depth, and has one task in every chain.
find_independent_chains is used
to figure out how to coalesce chains across depths.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
depths
|
Dict
|
The dictionary of depths to group by. |
required |
Returns:
| Type | Description |
|---|---|
List[List[List]]
|
A list of lists of lists ordered by depth. |
Example
This method will return a list that could look something like this:
Here, the outer index of this list is the depth, the middle index is which chain of tasks in that depth, and the inner index is the task id in that chain.
Source code in merlin/study/dag.py
group_tasks(source_node)
Group independent tasks in a Directed Acyclic Graph (DAG).
Starts from a source node and works down, grouping tasks by depth, then identifies independent parallel chains in those groups.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_node
|
str
|
The source node from which to start grouping tasks. |
required |
Returns:
| Type | Description |
|---|---|
List[List[List]]
|
A list of independent chains of tasks. |
Source code in merlin/study/dag.py
num_children(task_name)
Find the number of children for the given task in the Directed Acyclic Graph (DAG).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_name
|
str
|
The name of the task to count the children of. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Number of children this task has. |
Source code in merlin/study/dag.py
num_parents(task_name)
Find the number of parents for the given task in the Directed Acyclic Graph (DAG).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_name
|
str
|
The name of the task to count the parents of. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Number of parents this task has. |
Source code in merlin/study/dag.py
parents(task_name)
Return the parents of the task.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_name
|
str
|
The name of the task to get the parents of. |
required |
Returns:
| Type | Description |
|---|---|
List
|
List of parents of this task. |
Source code in merlin/study/dag.py
step(task_name)
Return a Step object for the given task name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
task_name
|
str
|
The task name. |
required |
Returns:
| Type | Description |
|---|---|
Step
|
A Merlin |