Ask AI

You are viewing an unreleased or outdated version of the documentation

Tutorial, part five: Scheduling your pipeline#

Now that you've written an entire pipeline in Dagster, you will need to run it regularly to keep your assets up to date.

By the end of this part of the tutorial, you'll be able to:

  • Structure your project with code locations
  • Refresh your assets periodically with schedules

Step 1: Scheduling the materializations#

A schedule's responsibility is to launch runs that target a set of assets (or job) at a specified time. Schedules are created with the ScheduleDefinition class.

Dagster's AssetSelection lets you specify a set of assets to target with a schedule. In the example below, AssetSelection.all selects all assets.

To regularly update the assets, add the new ScheduleDefinition import, create a new schedule that targets AssetSelection.all(), and add the schedule to the code location. The code below is how your definitions.py should look after making these changes:

from dagster import (
    AssetSelection,
    Definitions,
    ScheduleDefinition,
    load_assets_from_modules,
)

from . import assets

all_assets = load_assets_from_modules([assets])

# Addition: a ScheduleDefinition targeting all assets and a cron schedule of how frequently to run it
hackernews_schedule = ScheduleDefinition(
    name="hackernews_schedule",
    target=AssetSelection.all(),
    cron_schedule="0 * * * *",  # every hour
)

defs = Definitions(
    assets=all_assets,
    schedules=[hackernews_schedule],
)

Go to the UI, click Automation, and observe your new schedule.

a schedule in the Dagster UI

To test the change, click the schedule's name to view its details. Click the Test Schedule button on the top right corner of the page to trigger the schedule immediately.

how to test a schedule in the Dagster UI

Other ways to automate your pipelines#

You've used a schedule to update your data on a regular cadence. However, there are other ways to automatically trigger updates. For example, sensors can launch runs after routinely polling a source. Check out the Automation guide to learn more.

About definitions#

Up until this point, you defined assets using the @asset decorator. Dagster definitions are entities that Dagster learns about by importing your code. Just now, you used a different kind of definition: a job definition.

Managing one type of definition, such as assets, is easy. However, it can quickly become unruly as your project grows to have a variety of definitions (ex. schedules, jobs, sensors). To combine definitions and have them aware of each other, Dagster provides a utility called the Definitions object.


Next steps#

By now, you've:

  • Grouped your objects with a code location
  • Defined a schedule that targets all assets in your code location
  • Tested the schedule to ensure it works as expected

In the next section, you'll learn how to build more robustness, reusability, and flexibility when connecting to external services by using resources.