utils
Module for project-wide utility functions.
apply_list_of_regex(regex_list, list_to_filter, result_list, match=False, display_warning=True)
Apply a list of regex patterns to a list and accumulate the results.
This function takes each regex from the provided list of regex patterns and applies it to the specified list. The results of each successful match or search are appended to a result list. Optionally, it can display a warning if a regex does not match any item in the list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
regex_list
|
List[str]
|
A list of regular expressions to apply to the |
required |
list_to_filter
|
List[str]
|
The list of strings that the regex patterns will be applied to. |
required |
result_list
|
List[str]
|
The list where results of the regex filters will be appended. |
required |
match
|
bool
|
If True, uses re.match for applying the regex. If False, uses re.search. |
False
|
display_warning
|
bool
|
If True, displays a warning message when no matches are found for a regex. |
True
|
Side Effect
This function modifies the result_list in place.
Source code in merlin/utils.py
cd(path)
Context manager for changing the current working directory.
This context manager changes the current working directory to the specified path
while executing the block of code within the context. Once the block is exited,
it restores the original working directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
The path to the directory to change to. |
required |
Yields:
| Type | Description |
|---|---|
None
|
Control is yielded back to the block of code within the context. |
Source code in merlin/utils.py
check_machines(machines)
Check if the current machine is in the list of specified machines.
This function determines whether the hostname of the current machine matches any entry in a provided list of machine names. It returns True if a match is found, otherwise it returns False.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
machines
|
Union[str, List[str], Tuple[str]]
|
A single machine name or a list/tuple of machine names to compare with the current machine's hostname. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the current machine's hostname matches any of the specified machines; False otherwise. |
Source code in merlin/utils.py
check_pid(pid, user=None)
Check if a given process ID (PID) is in the process list for a specified user.
This function determines whether a specific PID is currently running for the specified user. If no user is specified, it defaults to the current user. It can also check for all users if specified.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pid
|
int
|
The process ID to check for in the process list. |
required |
user
|
str
|
The username for which to check the process. If set to 'all_users', checks processes for all users. Defaults to the current user's username if not provided. |
None
|
Returns:
| Type | Description |
|---|---|
bool
|
True if the specified PID is found in the process list for the given user, False otherwise. |
Source code in merlin/utils.py
contains_shell_ref(string)
Check if the given string contains a shell variable reference.
This function searches for shell variable references in the format of $<variable> or ${<variable>}, where <variable> consists of alphanumeric characters and underscores. It returns True if a match is found; otherwise, it returns False.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
string
|
str
|
The input string to be checked for shell variable references. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the input string contains a shell variable reference of the form $STR or ${STR}; False otherwise. |
Source code in merlin/utils.py
contains_token(string)
Check if the given string contains a token of the form $(STR).
This function uses a regular expression to search for tokens that match the pattern $(<word>), where <word> consists of alphanumeric characters and underscores. It returns True if such a token is found; otherwise, it returns False.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
string
|
str
|
The input string to be checked for tokens. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if the input string contains a token of the form $(STR); False otherwise. |
Source code in merlin/utils.py
convert_timestring(timestring, format_method='HMS')
Converts a timestring to a specified format.
This function accepts a timestring in a specific format or an integer representing seconds, and converts it to a formatted string based on the chosen format method. The available format methods are:
- HMS: Represents the duration in 'hours:minutes:seconds' format.
- FSD: Represents the duration in Flux Standard Duration (FSD), expressed as a floating-point number of seconds with an 's' suffix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
timestring
|
Union[str, int]
|
A string representing time in the format '[days]:[hours]:[minutes]:seconds' (where days, hours, and minutes are optional), or an integer representing time in seconds. |
required |
format_method
|
str
|
The method to use for formatting. Must be either 'HMS' or 'FSD'. |
'HMS'
|
Returns:
| Type | Description |
|---|---|
str
|
A string representation of the converted timestring formatted according to the specified method. |
Source code in merlin/utils.py
convert_to_timedelta(timestr)
Convert a time string or integer to a timedelta object.
The function takes a time string formatted as '[days]:[hours]:[minutes]:seconds', where days, hours, and minutes are optional. If an integer is provided, it is interpreted as the total number of seconds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
timestr
|
Union[str, int]
|
The time string in the specified format or an integer representing seconds. |
required |
Returns:
| Type | Description |
|---|---|
timedelta
|
A timedelta object representing the duration specified by the input string or integer. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the input string does not conform to the expected format or contains more than four time fields. |
Source code in merlin/utils.py
determine_protocol(fname)
Determine the file protocol based on the file name extension.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fname
|
str
|
The name of the file whose protocol is to be determined. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The protocol corresponding to the file extension (e.g., 'hdf5'). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the provided file name does not have a valid extension. |
Source code in merlin/utils.py
dict_deep_merge(dict_a, dict_b, path=None, conflict_handler=None)
Recursively merges dict_b into dict_a, performing a deep merge.
This function combines two dictionaries by recursively merging
the contents of dict_b into dict_a. Unlike Python's built-in
dictionary merge, this function performs a deep merge, meaning
it will merge nested dictionaries instead of just updating top-level keys.
Existing keys in dict_a will not be updated unless a conflict handler
is provided to resolve key conflicts.
Credit to this stack overflow post.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dict_a
|
Dict
|
The dictionary that will be merged into. |
required |
dict_b
|
Dict
|
The dictionary to merge into |
required |
path
|
str
|
The current path in the dictionary tree. This is used for logging purposes during recursion. |
None
|
conflict_handler
|
Callable
|
A function to handle conflicts when both dictionaries have the same key with different values. The function should return the value to be used in the merged dictionary. If not provided, a warning will be logged for conflicts. |
None
|
Source code in merlin/utils.py
ensure_directory_exists(**kwargs)
Ensure that the directory for the specified aggregate file exists.
This function checks if the directory for the given aggregate_file exists.
If it does not exist, the function creates the necessary directories.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Dict[Any, Any]
|
Keyword arguments that must include:
|
{}
|
Returns:
| Type | Description |
|---|---|
bool
|
True if the directory already existed. False otherwise. |
Source code in merlin/utils.py
expandvars2(path)
Replace shell variables in the given path with their corresponding environment variable values.
This function expands shell-style variable references (e.g., $VAR) in the input path using the current environment variables. It also ensures that any escaped dollar signs (e.g., \$) are not expanded.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
The input path containing shell variable references to be expanded. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The path with shell variables replaced by their corresponding values from the environment, with unescaped variables expanded. |
Source code in merlin/utils.py
find_vlaunch_var(vlaunch_var, step_cmd, accept_no_matches=False)
Find and return the specified VLAUNCHER variable from the step command.
This function searches for a variable defined in the VLAUNCHER context
within the provided step command string. It looks for the variable in
the format MERLIN_<vlaunch_var>=<value>. If the variable is found,
it returns the variable in a format suitable for use in a command string.
If the variable is not found, the behavior depends on the accept_no_matches flag.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vlaunch_var
|
str
|
The name of the VLAUNCHER variable (without the prefix 'MERLIN_'). |
required |
step_cmd
|
str
|
The command string of a step where the variable may be defined. |
required |
accept_no_matches
|
bool
|
If True, returns None if the variable is not found. If False, raises a ValueError. Defaults to False. |
False
|
Returns:
| Type | Description |
|---|---|
str
|
The variable in the format '${MERLIN_<vlaunch_var>}' if found, otherwise None
(if |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the variable is not found and |
Source code in merlin/utils.py
get_flux_alloc(flux_path, no_errors=False)
Generate the flux alloc command based on the installed version.
This function constructs the appropriate command for allocating
resources with Flux, depending on the version of Flux installed
at the specified flux_path. It defaults to "{flux_path} alloc"
for versions greater than or equal to 0.48.x. For older versions,
it adjusts the command accordingly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
flux_path
|
str
|
The full path to the Flux binary. |
required |
no_errors
|
bool
|
A flag to suppress error messages and exceptions if set to True. |
False
|
Returns:
| Type | Description |
|---|---|
str
|
The appropriate Flux allocation command as a string. |
Source code in merlin/utils.py
get_flux_cmd(flux_path, no_errors=False)
Generate the Flux run command based on the installed version.
This function determines the appropriate Flux command to use for
running jobs, depending on the version of Flux installed at the
specified flux_path. It defaults to "flux run" for versions
greater than or equal to 0.48.x. For older versions, it adjusts
the command accordingly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
flux_path
|
str
|
The full path to the Flux binary. |
required |
no_errors
|
bool
|
A flag to suppress error messages and exceptions if set to True. |
False
|
Returns:
| Type | Description |
|---|---|
str
|
The appropriate Flux run command as a string. |
Source code in merlin/utils.py
get_flux_version(flux_path, no_errors=False)
Retrieve the version of Flux as a string.
This function executes the Flux binary located at flux_path with the
"version" command and parses the output to return the version number.
If the command fails or the Flux binary cannot be found, it can either
raise an error or return a default version based on the no_errors flag.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
flux_path
|
str
|
The full path to the Flux binary. |
required |
no_errors
|
bool
|
A flag to suppress error messages and exceptions. If set to True, errors will be logged but not raised. |
False
|
Returns:
| Type | Description |
|---|---|
str
|
The version of Flux as a string. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the Flux binary cannot be found and |
ValueError
|
If the version cannot be determined from the output and
|
Source code in merlin/utils.py
get_package_versions(package_list)
Generate a formatted table of installed package versions and their locations.
This function takes a list of package names and checks for their installed versions and locations. If a package is not installed, it indicates that the package is "Not installed". The output includes the Python version and its executable location at the top of the table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
package_list
|
List[str]
|
A list of package names to check for installed versions. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A formatted string representing a table of package names, their versions, and installation locations. |
Source code in merlin/utils.py
get_pid(name, user=None)
Return the process ID(s) (PID) of processes with the specified name.
This function retrieves the PID(s) of all running processes that match the given name for a specified user. If no user is specified, it defaults to the current user. It can also retrieve PIDs for all users if specified.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the process to search for. |
required |
user
|
str
|
The username for which to retrieve the process IDs. If set to 'all_users', retrieves processes for all users. Defaults to the current user's username if not provided. |
None
|
Returns:
| Type | Description |
|---|---|
List[int]
|
A list of PIDs for processes matching the specified name. Returns None if no matching processes are found. |
Source code in merlin/utils.py
get_procs(name, user=None)
Return a list of tuples containing the process ID (PID) and command line of processes with the specified name.
This function retrieves all running processes that match the given name for a specified user. If no user is specified, it defaults to the current user. It can also retrieve processes for all users if specified.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the process to search for. |
required |
user
|
str
|
The username for which to retrieve the process information. If set to 'all_users', retrieves processes for all users. Defaults to the current user's username if not provided. |
None
|
Returns:
| Type | Description |
|---|---|
List[Tuple[int, str]]
|
A list of tuples, each containing the PID and command line of processes matching the specified name. Returns an empty list if no matching processes are found. |
Source code in merlin/utils.py
get_source_root(filepath)
Find the absolute project path given a file path from within the project.
This function determines the root directory of a project by analyzing the given file path. It works by traversing the directory structure upwards until it encounters a directory name that is not an integer, which is assumed to be the project root.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
The file path from within the project for which to find the root. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The absolute path to the root directory of the project. Returns None if the path corresponds to the root directory itself. |
Source code in merlin/utils.py
get_user_process_info(user=None, attrs=None)
Return a list of process information for all of the user's running processes.
This function retrieves and returns details about the currently running processes for a specified user. If no user is specified, it defaults to the current user. It can also return information for all users if specified.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
user
|
str
|
The username for which to retrieve process information. If set to 'all_users', retrieves processes for all users. Defaults to the current user's username if not provided. |
None
|
attrs
|
List[str]
|
A list of attributes to include in the process information. Defaults to ["pid", "name", "username", "cmdline"] if None. If "username" is not included in the list, it will be added. |
None
|
Returns:
| Type | Description |
|---|---|
List[Dict]
|
A list of dictionaries containing the specified attributes for each process belonging to the specified user or all users if 'all_users' is specified. |
Source code in merlin/utils.py
get_yaml_var(entry, var, default)
Retrieve the value associated with a specified key from a YAML dictionary.
This function attempts to return the value of var from the provided entry
dictionary. If the key does not exist, it will try to access it as an attribute
of the entry object. If neither is found, the function returns the specified
default value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entry
|
Dict[str, Any]
|
A dictionary representing the contents of a YAML file. |
required |
var
|
str
|
The key or attribute name to retrieve from the entry. |
required |
default
|
Any
|
The default value to return if the key or attribute is not found. |
required |
Returns:
| Type | Description |
|---|---|
Any
|
The value associated with |
Source code in merlin/utils.py
is_running(name, all_users=False)
Determine if a process with the specified name is currently running.
This function checks for the existence of a running process with the provided name by executing the 'ps' command. It can be configured to check processes for all users or just the current user.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the process to search for. |
required |
all_users
|
bool
|
If True, checks for processes across all users. Defaults to False, which checks only the current user's processes. |
False
|
Returns:
| Type | Description |
|---|---|
bool
|
True if a process with the specified name is found; otherwise, False. |
Source code in merlin/utils.py
is_running_psutil(cmd, user=None)
Determine if a process with the given command is currently running.
This function checks for the existence of any running processes that
match the specified command. It uses the psutil library to gather
process information instead of making a call to the 'ps' command.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cmd
|
str
|
The command or command line snippet to search for in running processes. |
required |
user
|
str
|
The username for which to check running processes. If set to 'all_users', checks processes for all users. Defaults to the current user's username if not provided. |
None
|
Returns:
| Type | Description |
|---|---|
bool
|
True if at least one matching process is found; otherwise, False. |
Source code in merlin/utils.py
load_array_file(filename, ndmin=2)
Load an array from a file based on its extension.
This function reads an array stored in the specified filename.
It supports three file types based on their extensions:
.npyfor NumPy binary files.csvfor comma-separated values.tabfor whitespace (or tab) separated values
The function ensures that the loaded array has at least ndmin dimensions.
If the array is in binary format, it checks the dimensions without altering the data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
str
|
The path to the file to load. |
required |
ndmin
|
int
|
The minimum number of dimensions the array should have. |
2
|
Returns:
| Type | Description |
|---|---|
ndarray
|
The loaded array. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If the file extension is not one of the supported types
( |
Source code in merlin/utils.py
load_yaml(filepath)
Safely read a YAML file and return its contents.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
The file path to the YAML file to be read. |
required |
Returns:
| Type | Description |
|---|---|
Dict
|
A dict representing the contents of the YAML file. |
Source code in merlin/utils.py
needs_merlin_expansion(cmd, restart_cmd, labels, include_sample_keywords=True)
Check if the provided command or restart command contains variables that require expansion.
This function checks both the command (cmd) and the restart command (restart_cmd)
for the presence of specified labels or sample keywords that indicate a need for variable
expansion.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cmd
|
str
|
The command inside a study step to check for variable expansion. |
required |
restart_cmd
|
str
|
The restart command inside a study step to check for variable expansion. |
required |
labels
|
List[str]
|
A list of labels to check for inside |
required |
include_sample_keywords
|
bool
|
Flag to indicate whether to include default sample keywords in the label check. |
True
|
Returns:
| Type | Description |
|---|---|
bool
|
True if either |
Source code in merlin/utils.py
nested_dict_to_namespaces(dic)
Convert a nested dictionary into a nested SimpleNamespace structure.
This function recursively transforms a dictionary (which may contain other dictionaries) into a structure of SimpleNamespace objects. Each key in the dictionary becomes an attribute of a SimpleNamespace, allowing for attribute-style access to the data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dic
|
Dict
|
The nested dictionary to be converted. |
required |
Returns:
| Type | Description |
|---|---|
SimpleNamespace
|
A SimpleNamespace object representing the nested structure of the input dictionary. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If the input is not a dictionary. |
Source code in merlin/utils.py
nested_namespace_to_dicts(namespaces)
Convert a nested SimpleNamespace structure into a nested dictionary.
This function recursively transforms a SimpleNamespace (which may contain other SimpleNamespaces) into a dictionary structure. Each attribute of the SimpleNamespace becomes a key in the resulting dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
namespaces
|
SimpleNamespace
|
The nested SimpleNamespace to be converted. |
required |
Returns:
| Type | Description |
|---|---|
Dict
|
A dictionary representing the nested structure of the input SimpleNamespace. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If the input is not a SimpleNamespace. |
Source code in merlin/utils.py
pickle_data(filepath, content)
Dump content to a pickle file.
This function serializes the given content and writes it to a specified file
in pickle format. The file is opened in write mode, which will overwrite any
existing content in the file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
The path to the file where the content will be saved. |
required |
content
|
Any
|
The data to be serialized and saved to the pickle file. |
required |
Source code in merlin/utils.py
pretty_format_hms(timestring)
Format an HMS timestring to remove blank entries and add appropriate labels.
This function takes a timestring in the 'HH:MM:SS' format and formats it by removing any components that are zero and appending the relevant labels (days, hours, minutes, seconds). The output is a cleaner string representation of the time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
timestring
|
str
|
A timestring formatted as 'DD:HH:MM:SS'. Each component represents days, hours, minutes, and seconds, respectively. Only the last four components are relevant and may include leading zeros. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A formatted timestring with non-zero components labeled appropriately. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the input timestring contains more than four components or is not in the expected format. |
Examples:
>>> pretty_format_hms("00:00:34:00")
'34m'
>>> pretty_format_hms("01:00:00:25")
'01d:25s'
>>> pretty_format_hms("00:19:44:28")
'19h:44m:28s'
>>> pretty_format_hms("00:00:00:00")
'00s'
Source code in merlin/utils.py
regex_list_filter(regex, list_to_filter, match=True)
Apply a regex filter to a list.
This function filters a given list based on a specified regular expression.
Depending on the match parameter, it can either match the entire string
or search for the regex pattern within the strings of the list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
regex
|
str
|
The regular expression to use for filtering the list. |
required |
list_to_filter
|
List[str]
|
The list of strings to be filtered based on the regex. |
required |
match
|
bool
|
If True, uses re.match to filter items that match the regex from the start. If False, uses re.search to filter items that contain the regex pattern. |
True
|
Returns:
| Type | Description |
|---|---|
List[str]
|
A new list containing the filtered items that match the regex. |
Source code in merlin/utils.py
repr_timedelta(time_delta, method='HMS')
Represent a timedelta object as a string using a specified format method.
This function formats a given timedelta object according to the chosen method. The available methods are:
- HMS: Represents the duration in 'hours:minutes:seconds' format.
- FSD: Represents the duration in Flux Standard Duration (FSD), expressed as a floating-point number of seconds with an 's' suffix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
time_delta
|
timedelta
|
The timedelta object to be formatted. |
required |
method
|
str
|
The method to use for formatting. Must be either 'HMS' or 'FSD'. |
'HMS'
|
Returns:
| Type | Description |
|---|---|
str
|
A string representation of the timedelta formatted according to the specified method. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If an invalid method is provided. |
Source code in merlin/utils.py
verify_dirpath(dirpath)
Verify that the given directory path is valid and return its absolute form.
This function checks if the specified dirpath points to an existing directory.
It expands any user directory shortcuts (e.g., ~) and environment variables
in the provided path before verifying its existence. If the directory does not exist,
a ValueError is raised.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dirpath
|
str
|
The path of the directory to verify. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The verified absolute directory path with expanded environment variables. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the provided directory path does not point to a valid directory. |
Source code in merlin/utils.py
verify_filepath(filepath)
Verify that the given file path is valid and return its absolute form.
This function checks if the specified filepath points to an existing file.
It expands any user directory shortcuts (e.g., ~) and environment variables
in the provided path before verifying its existence. If the file does not exist,
a ValueError is raised.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
The path of the file to verify. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The verified absolute file path with expanded environment variables. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the provided file path does not point to a valid file. |
Source code in merlin/utils.py
ws_time_to_dt(ws_time)
Convert a workspace timestring to a datetime object.
This function takes a workspace timestring formatted as 'YYYYMMDD-HHMMSS' and converts it into a corresponding datetime object. The input string must adhere to the specified format to ensure accurate conversion.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ws_time
|
str
|
A workspace timestring in the format 'YYYYMMDD-HHMMSS', where:
|
required |
Returns:
| Type | Description |
|---|---|
datetime
|
A datetime object constructed from the provided workspace timestring. |