Ask AI

You are viewing an unreleased or outdated version of the documentation

Run configuration#

This guide covers using the new Pythonic config system introduced in Dagster 1.3. If your code is still using the legacy APIs, see the legacy configuration guide. To migrate your code, refer to the migrating to Pythonic resources and config guide.

Run configuration allows providing parameters to jobs at the time they're executed.

It's often useful to provide user-chosen values to Dagster jobs or asset definitions at runtime. For example, you might want to provide a connection URL for a database resource. Dagster exposes this functionality through a configuration API.

Various Dagster entities (assets, ops, resources) can be individually configured. When launching a job that materializes (assets), executes (ops), or instantiates (resources) a configurable entity, you can provide run configuration for each entity. Within the function that defines the entity, you can access the passed-in configuration through the config parameter. Typically, the provided run configuration values correspond to a configuration schema attached to the asset/op/resource definition. Dagster validates the run configuration against the schema and proceeds only if validation is successful.

A common use of configuration is for a schedule or sensor to provide configuration to the job run it is launching. For example, a daily schedule might provide the day it's running on to one of the assets as a config value, and that asset might use that config value to decide what day's data to read.


Defining and accessing configuration#

Configurable parameters accepted by an asset or op are specified by defining a config model subclass of Config and a config parameter to the corresponding asset or op function. Under the hood, these config models utilize Pydantic, a popular Python library for data validation and serialization.

During execution, the specified config is accessed within the body of the op or asset using the config parameter.

Using asset definitions#

Here, we define a subclass of Config holding a single string value representing the name of a user. We can access the config through the config parameter in the asset body.

from dagster import asset, Config

class MyAssetConfig(Config):
    person_name: str

@asset
def greeting(config: MyAssetConfig) -> str:
    return f"hello {config.person_name}"

These examples showcase the most basic config types that can be used. For more information on the set of config types Dagster supports, see the advanced config types documentation.


Defining and accessing Pythonic configuration for a resource#

Configurable parameters for a resource are defined by specifying attributes for a resource class, which subclasses ConfigurableResource. The below resource defines a configurable connection URL, which can be accessed in any methods defined on the resource.

from dagster import op, ConfigurableResource

class MyDatabaseResource(ConfigurableResource):
    connection_url: str

    def query(self, query: str):
        return get_engine(self.connection_url).execute(query)

For more information on using resources, refer to the Resources guide.


Specifying runtime configuration#

To execute a job or materialize an asset that specifies config, you'll need to provide values for its parameters. How we provide these values depends on the interface we are using:

Python#

When specifying config from the Python API, we can use the run_config argument for JobDefinition.execute_in_process or materialize. This takes a RunConfig object, within which we can supply config on a per-op or per-asset basis. The config is specified as a dictionary, with the keys corresponding to the op/asset names and the values corresponding to the config values.

from dagster import job, materialize, op, RunConfig

@job
def greeting_job():
    print_greeting()

job_result = greeting_job.execute_in_process(
    run_config=RunConfig({"print_greeting": MyOpConfig(person_name="Alice")})
)

asset_result = materialize(
    [greeting],
    run_config=RunConfig({"greeting": MyAssetConfig(person_name="Alice")}),
)

Validation#

Dagster validates any provided run config against the corresponding Pydantic model. It will abort execution with a DagsterInvalidConfigError or Pydantic ValidationError if validation fails. For example, both of the following will fail, because there is no nonexistent_config_value in the config schema:

@job
def greeting_job():
    print_greeting()

op_result = greeting_job.execute_in_process(
    run_config=RunConfig(
        {"print_greeting": MyOpConfig(nonexistent_config_value=1)}
    ),
)

asset_result = materialize(
    [greeting],
    run_config=RunConfig({"greeting": MyAssetConfig(nonexistent_config_value=1)}),
)

Using environment variables with config#

Assets and ops can be configured using environment variables by passing an EnvVar when constructing a config object. This is useful when the value is sensitive or may vary based on environment. If using Dagster+, environment variables can be set up directly in the UI.

from dagster import job, materialize, op, RunConfig, EnvVar

job_result = greeting_job.execute_in_process(
    run_config=RunConfig(
        {"print_greeting": MyOpConfig(person_name=EnvVar("PERSON_NAME"))}
    )
)

asset_result = materialize(
    [greeting],
    run_config=RunConfig(
        {"greeting": MyAssetConfig(person_name=EnvVar("PERSON_NAME"))}
    ),
)

Refer to the Environment variables and secrets guide for more general info about environment variables in Dagster.


Next steps#

Config is a powerful tool for making Dagster pipelines more flexible and observable. For a deeper dive into the supported config types, see the advanced config types documentation. For more information on using resources, which are a powerful way to encapsulate reusable logic, see the Resources guide.