Skip to content

Configuration

Note

Merlin works best configuring Celery to run with a RabbitMQ broker and a Redis backend. Merlin uses Celery chords which require a results backend to be configured. The Amqp (rpc RabbitMQ) server does not support chords but the Redis, Database, Memcached and more, support chords.

The Celery library provides several ways to configure both your broker and your results backend. This page will go over why configuration is necessary and will detail the different configurations that Merlin supports.

Why is Configuration Necessary?

As explained in the User Guide Landing Page, Merlin uses a central server to store tasks in a queue which workers will manage. To establish this functionality, Merlin uses the Celery library. Because of this, Merlin requires users to configure a broker and results backend.

What is a Broker?

A broker is a message queue that acts as an intermediary between the sender of a task and the worker processes that execute the task. It facilitates the communication between different parts of a distributed system by passing messages (tasks) from producers (the code that generates tasks) to consumers (worker processes that execute tasks).

The broker is responsible for queuing the tasks and delivering them to the appropriate worker processes. It allows for the decoupling of the task producer and the task consumer, enabling a more scalable and flexible architecture.

Celery supports various message brokers, including RabbitMQ, Redis, and others. You can configure Celery to use a specific broker based on your requirements (although we suggest using RabbitMQ).

See the Configuring the Broker and Results Backend section below for more information on configuring your broker.

What is a Results Backend?

The results backend is a storage system where the results of executed tasks are stored. After a task is executed by a worker, the result is stored in the result backend, and the original task sender can retrieve the result later.

The results backend enables the asynchronous nature of Celery. Instead of blocking and waiting for a task to complete, the sender can continue with other work and later retrieve the result from the results backend.

Celery supports various results backends, including databases (such as SQLAlchemy, Django ORM), message brokers (Redis, RabbitMQ), and others. You can configure Celery to use a specific broker based on your requirements (although we suggest using Redis). However, since Merlin utilizes Celery chords and the amqp (rpc RabbitMQ) server does not support chords, we cannot use RabbitMQ as a results backend.

See the Configuring the Broker and Results Backend section below for more information on configuring your results backend.

The app.yaml File

In order to read in configuration options for your Celery settings, broker, and results backend, Merlin utilizes an app.yaml file.

There's a built-in command with Merlin to set up a skeleton app.yaml for you:

merlin config

This command will create an app.yaml file in the ~/.merlin/ directory that looks like so:

celery:
    # see Celery configuration options
    # https://docs.celeryproject.org/en/stable/userguide/configuration.html
    override:
        visibility_timeout: 86400

broker:
    # can be redis, redis+sock, or rabbitmq
    name: rabbitmq
    #username: # defaults to your username unless changed here
    password: ~/.merlin/jackalope-password
    # server URL
    server: jackalope.llnl.gov

    ### for rabbitmq, redis+sock connections ###
    #vhost: # defaults to your username unless changed here

    ### for redis+sock connections ###
    #socketname: the socket name your redis connection can be found on.
    #path: The path to the socket.

    ### for redis connections ###
    #port: The port number redis is listening on (default 6379)
    #db_num: The data base number to connect to.


results_backend:
    # must be redis
    name: redis
    dbname: mlsi
    username: mlsi
    # name of file where redis password is stored.
    password: redis.pass
    server: jackalope.llnl.gov
    # merlin will generate this key if it does not exist yet,
    # and will use it to encrypt all data over the wire to
    # your redis server.
    encryption_key: ~/.merlin/encrypt_data_key
    port: 6379
    db_num: 0

As you can see there are three key sections to Merlin's app.yaml file: celery, broker, and results_backend. The rest of this page will go into more depth on each.

The Celery Section

In the celery section of your app.yaml you can override any Celery settings that you may want to change.

Merlin's default Celery configurations are as follows:

Default Celery Configuration
accept_content: ['pickle']  # DO NOT MODIFY
result_accept_content: None  # DO NOT MODIFY
enable_utc: True
imports: ()
include: ()
timezone: None
beat_max_loop_interval: 0
beat_schedule: {}
beat_scheduler: celery.beat:PersistentScheduler
beat_schedule_filename: celerybeat-schedule
beat_sync_every: 0
beat_cron_starting_deadline: None
broker_url: <set in the broker section of app.yaml>
broker_read_url: None
broker_write_url: None
broker_transport: None
broker_transport_options: {'visibility_timeout': 86400, 'max_connections': 100}
broker_connection_timeout: 4
broker_connection_retry: True
broker_connection_retry_on_startup: None
broker_connection_max_retries: 100
broker_channel_error_retry: False
broker_failover_strategy: None
broker_heartbeat: 120
broker_heartbeat_checkrate: 3.0
broker_login_method: None
broker_pool_limit: 0
broker_use_ssl: <set in the broker section of app.yaml>
broker_host: <set in the broker section of app.yaml>
broker_port: <set in the broker section of app.yaml>
broker_user: <set in the broker section of app.yaml>
broker_password: <set in the broker section of app.yaml>
broker_vhost: <set in the broker section of app.yaml>
cache_backend: None
cache_backend_options: {}
cassandra_entry_ttl: None
cassandra_keyspace: None
cassandra_port: None
cassandra_read_consistency: None
cassandra_servers: None
cassandra_bundle_path: None
cassandra_table: None
cassandra_write_consistency: None
cassandra_auth_provider: None
cassandra_auth_kwargs: None
cassandra_options: {}
s3_access_key_id: None
s3_secret_access_key: None
s3_bucket: None
s3_base_path: None
s3_endpoint_url: None
s3_region: None
azureblockblob_container_name: celery
azureblockblob_retry_initial_backoff_sec: 2
azureblockblob_retry_increment_base: 2
azureblockblob_retry_max_attempts: 3
azureblockblob_base_path: 
azureblockblob_connection_timeout: 20
azureblockblob_read_timeout: 120
control_queue_ttl: 300.0
control_queue_expires: 10.0
control_exchange: celery  # DO NOT MODIFY
couchbase_backend_settings: None
arangodb_backend_settings: None
mongodb_backend_settings: None
cosmosdbsql_database_name: celerydb
cosmosdbsql_collection_name: celerycol
cosmosdbsql_consistency_level: Session
cosmosdbsql_max_retry_attempts: 9
cosmosdbsql_max_retry_wait_time: 30
event_queue_expires: 60.0
event_queue_ttl: 5.0
event_queue_prefix: celeryev
event_serializer: json  # DO NOT MODIFY
event_exchange: celeryev  # DO NOT MODIFY
redis_backend_use_ssl: <set in results_backend section of app.yaml>
redis_db: <set in results_backend section of app.yaml>
redis_host: <set in results_backend section of app.yaml>
redis_max_connections: 100000
redis_username: <set in results_backend section of app.yaml>
redis_password: <set in results_backend section of app.yaml>
redis_port: <set in results_backend section of app.yaml>
redis_socket_timeout: 120.0
redis_socket_connect_timeout: None
redis_retry_on_timeout: False
redis_socket_keepalive: False
result_backend: <set in results_backend section of app.yaml>
result_cache_max: -1
result_compression: None
result_exchange: celeryresults
result_exchange_type: direct
result_expires: 1 day, 0:00:00
result_persistent: None
result_extended: False
result_serializer: pickle  # DO NOT MODIFY
result_backend_transport_options: {}
result_chord_retry_interval: 1.0
result_chord_join_timeout: 3.0
result_backend_max_sleep_between_retries_ms: 10000
result_backend_max_retries: inf
result_backend_base_sleep_between_retries_ms: 10
result_backend_always_retry: False
elasticsearch_retry_on_timeout: None
elasticsearch_max_retries: None
elasticsearch_timeout: None
elasticsearch_save_meta_as_text: True
security_certificate: None
security_cert_store: None
security_key: None
security_key_password: None
security_digest: sha256
database_url: None
database_engine_options: None
database_short_lived_sessions: False
database_table_schemas: None
database_table_names: None
task_acks_late: True  # DO NOT MODIFY
task_acks_on_failure_or_timeout: True  # DO NOT MODIFY
task_always_eager: False  # DO NOT MODIFY
task_annotations: None  # DO NOT MODIFY
task_compression: None  # DO NOT MODIFY
task_create_missing_queues: True  # DO NOT MODIFY
task_inherit_parent_priority: False  # DO NOT MODIFY
task_default_delivery_mode: 2  # DO NOT MODIFY
task_default_queue: merlin  # DO NOT MODIFY
task_default_exchange: None  # DO NOT MODIFY
task_default_exchange_type: direct  # DO NOT MODIFY
task_default_routing_key: None  # DO NOT MODIFY
task_default_rate_limit: None
task_default_priority: 5  # DO NOT MODIFY
task_eager_propagates: False
task_ignore_result: False
task_store_eager_result: False
task_protocol: 2  # DO NOT MODIFY
task_publish_retry: True
task_publish_retry_policy: {'interval_start': 10, 'interval_step': 10, 'interval_max': 60}
task_queues: None  # DO NOT MODIFY
task_queue_max_priority: 10  # DO NOT MODIFY
task_reject_on_worker_lost: True  # DO NOT MODIFY
task_remote_tracebacks: False  
task_routes: (<function route_for_task at 0x0123456789ab>,)  # DO NOT MODIFY
task_send_sent_event: False
task_serializer: pickle  # DO NOT MODIFY
task_soft_time_limit: None
task_time_limit: None
task_store_errors_even_if_ignored: False
task_track_started: False
task_allow_error_cb_on_chord_header: False
worker_agent: None  # DO NOT MODIFY
worker_autoscaler: celery.worker.autoscale:Autoscaler  # DO NOT MODIFY
worker_cancel_long_running_tasks_on_connection_loss: True
worker_concurrency: None  # DO NOT MODIFY; this will be set on a worker-by-worker basis that you can customize in your spec file
worker_consumer: celery.worker.consumer:Consumer  # DO NOT MODIFY
worker_direct: False  # DO NOT MODIFY
worker_disable_rate_limits: False
worker_deduplicate_successful_tasks: False
worker_enable_remote_control: True
worker_hijack_root_logger: True
worker_log_color: True
worker_log_format: [%(asctime)s: %(levelname)s] %(message)s
worker_lost_wait: 10.0
worker_max_memory_per_child: None
worker_max_tasks_per_child: None
worker_pool: prefork
worker_pool_putlocks: True  # DO NOT MODIFY
worker_pool_restarts: False
worker_proc_alive_timeout: 4.0
worker_prefetch_multiplier: 4  # this can be modified on a worker-by-worker basis in your spec file
worker_redirect_stdouts: True  # DO NOT MODIFY
worker_redirect_stdouts_level: WARNING
worker_send_task_events: False
worker_state_db: None
worker_task_log_format: [%(asctime)s: %(levelname)s] [%(task_name)s(%(task_id)s)] %(message)s
worker_timer: None
worker_timer_precision: 1.0
deprecated_settings: set()
visibility_timeout: 86400

See Celery's Configuration Settings for more information on each of these settings.

Overriding these settings is as simple as listing a new key-value pair in the celery.override section of your app.yaml.

Example

To change the visibility_timeout and broker_pool_limit settings, we'd modify the celery.override section of our app.yaml like so:

celery:
    override:
        broker_pool_limit: 10
        visibility_timeout: 75000

Configuring the Broker and Results Backend

When it comes to configuring the broker and results_backend sections of your app.yaml file, configuration will depend on the type of user you are and what type of servers you wish to use.

For Livermore Computing (LC) users we recommend configuring with either:

For all other users, we recommend configuring with either: