Ask AI

You are viewing an unreleased or outdated version of the documentation

API Docs#

These docs aim to cover the entire public surface of the core dagster APIs, as well as public APIs from all provided libraries.

Dagster follows SemVer. We attempt to isolate breaking changes to the public APIs to minor versions (on a roughly 12-week cadence) and will announce deprecations in Slack and in the release notes to patch versions (on a roughly weekly cadence).


Core#

APIs from the core dagster package, divided roughly by topic:

TopicDescription
Asset definitionsAPIs to define data assets.
Asset checks
Experimental
APIs to define checks that can be run on assets.
Schedules & SensorsAPIs to define schedules and sensors that initiate job execution, as well as some built-in helpers for common cases.
PartitionsAPIs to define partitions of the config space over which job runs can be backfilled.
Definitions (Code locations)APIs to collect definitions so that tools like the Dagster CLI or Dagster UI can load them as code locations.
ResourcesAPIs to define resources, which are typically used to model external services, tools, and storage for use within jobs.
ConfigThe types available to describe config schemas.
LoggersAPIs to define how logs are stored.
OpsAPIs to define or decorate functions as ops, declare their inputs and outputs, compose ops with each other, as well as the datatypes that op execution can return or yield.
HooksAPIs to define Dagster hooks, which can be triggered on specific Dagster events.
Op graphsAPIs to define a set of interconnected ops.
Dynamic mapping and collectAPIs that allow graph structures to be determined at run time.
JobsAPIs to define jobs that execute a set of ops with specific parameters.
ExecutionAPIs to execute and test jobs and individual ops, the execution context available to ops, job configuration, and the default executors available for executing jobs.
I/O managersAPIs to define how inputs and outputs are handled and loaded.
TypesThe types available for use with the Dagster Type system, which helps describe and verify at runtime the values that ops accept and produce.
PipesAPIs for working with the Dagster Pipes protocol from the orchestration side.
Dagster CLIBrowse repositories and execute jobs from the command line.
ErrorsClasses for errors thrown by the Dagster framework.
UtilitiesMiscellaneous helpers used by Dagster.
InternalsCore internal APIs that are important if you are interested in understanding how Dagster works with an eye towards extending it: logging, executors, system storage, the Dagster instance and plugin machinery, storage, schedulers.
Repositories
Legacy
APIs to define collections of jobs and other definitions that tools like the Dagster CLI or Dagster UI can load. Note: Definitions have replaced repositories and are now considered best practice.

Libraries#

Dagster also provides a growing set of optional add-on libraries to integrate with infrastructure and other components of the data ecosystem:

IntegrationDescription
Dagster Pipes (dagster-pipes) Library for inclusion in external processes when using Dagster Pipes protocol.
Airbyte (dagster-airbyte) Dagster integrations to run Airbyte jobs.
AWS (dagster-aws) Dagster integrations for working with AWS resources.
Azure (dagster-azure) Dagster integrations for working with Microsoft Azure resources.
Celery (dagster-celery)Provides an executor built on top of the popular Celery task queue, and an executor with support for using Celery on Kubernetes.
Celery & Docker (dagster-celery-docker)Provides an executor that lets Celery workers execute in Docker containers.
Celery & Kubernetes (dagster-celery-k8s) Provides an executor that lets Celery workers execute on Kubernetes.
Dask (dagster-dask)Provides an executor built on top of dask.distributed.
dbt (dagster-dbt)Provides ops and resources to run dbt projects.
Databricks (dagster-databricks) Provides ops and resources for integrating with Databricks.
Datadog (dagster-datadog)Provides an integration with Datadog, to support publishing metrics to Datadog from within Dagster ops.
Datahub (dagster-datahub)Provides an integration with Datahub, to support pushing metadata to Datahub from within Dagster ops.
Docker (dagster-docker)Provides components for deploying Dagster to Docker.
DuckDB (dagster-duckdb)Provides resources for querying DuckDB from Dagster.
DuckDB & Pandas (dagster-duckdb-pandas)Provides support for storing Pandas DataFrames in DuckDB.
DuckDB & Polars (dagster-duckdb-polars)Provides support for storing Polars DataFrames in DuckDB.
DuckDB & PySpark (dagster-duckdb-pyspark)Provides support for storing PySpark DataFrames in DuckDB.
Embedded ELT (dagster-embedded-elt)Provides support for running embedded ELT within Dagster
Fivetran (dagster-fivetran)Provides ops and resources to run Fivetran syncs.
Google Cloud Platform (GCP) (dagster-gcp)Dagster integrations for working with Google Cloud Platform resources.
GCP & Pandas (dagster-gcp-pandas)Dagster integrations for working with Google Cloud Platform resources with Pandas DataFrames. Currently contains integrations for BigQuery.
GCP & PySpark (dagster-gcp-pyspark)Dagster integrations for working with Google Cloud Platform resources with PySpark DataFrames. Currently contains integrations for BigQuery.
Great Expectations (GE) (dagster-ge)Dagster integrations for working with Great Expectations data quality tests.
GitHub (dagster-github)Provides a resource for issuing GitHub GraphQL queries and filing GitHub issues from Dagster jobs.
GraphQL (dagster-graphql)Provides resources for interfacing with a Dagster deployment over GraphQL.
Kubernetes (dagster-k8s)Provides components for deploying Dagster to Kubernetes.
Looker (dagster-looker)Provides an integration to represent a Looker project as a graph of assets.
Microsoft Teams (dagster-msteams)Includes a simple integration with Microsoft Teams.
MLflow (dagster-mlflow)Provides resources and hooks for using MLflow functionalities with Dagster runs.
MySQL (dagster-mysql)Includes implementations of run and event log storage built on MySQL.
PagerDuty (dagster-pagerduty)Provides an integration for generating PagerDuty events from Dagster ops.
Pandas (dagster-pandas)Provides support for using Pandas DataFrames in Dagster and utilities for performing data validation.
Pandera (dagster-pandera)Provides support for validating pandas dataframes using Pandera.
Papertrail (dagster-papertrail)Provides support for sending Dagster logs to Papertrail.
Polars (dagster-polars)Provides support for saving and loading Polars DataFrames in Dagster.
PostgreSQL (dagster-postgres)Includes implementations of run and event log storage built on Postgres.
PowerBI (dagster-powerbi)Provides an integration to represent a PowerBI Workspace as a graph of assets.
Prometheus (dagster-prometheus)Provides support for sending metrics to Prometheus.
Pyspark (dagster-pyspark)Provides an integration with Pyspark.
Shell (dagster-shell)Provides utilities for issuing shell commands from Dagster jobs.
Sigma (dagster-sigma)Provides an integration to represent a Sigma project as a graph of assets.
Slack (dagster-slack)Provides a simple integration with Slack.
Snowflake (dagster-snowflake)Provides resources for querying Snowflake from Dagster.
Snowflake & Pandas (dagster-snowflake-pandas)Provides support for storing Pandas DataFrames in Snowflake.
Snowflake & PySpark (dagster-snowflake-pyspark)Provides support for storing PySpark DataFrames in Snowflake.
Spark (dagster-spark)Provides an integration for working with Spark in Dagster.
SSH / SFTP (dagster-ssh)Provides an integration for running commands over SSH and retrieving / posting files via SFTP.
Tableau (dagster-tableau)Provides a resource for integrating Tableau Workspaces
Twilio (dagster-twilio) Provides a resource for posting SMS messages from ops via Twilio.
Weights & Biases (dagster-wandb) Provides an integration with Weights & Biases (W&B).