Skip to content

Getting Started

Installation

Installation of adfPy is simple:

pip install adfpy

This will give you:

  • The adfPy classes and functions you need to build awesome pipelines.
  • A CLI command: adfpy-deploy which parses your adfPy resource definitions and ensures they are correctly configured in your ADF instance.

Writing your first adfPy pipeline

For those of you familiar with Airflow, the syntax will (hopefully) look quite similar. To create a simple pipeline in ADF, the following adfPy code can be used:

from azure.mgmt.datafactory.models import BlobSource, BlobSink

from adfpy.activities.execution import AdfCopyActivity, AdfDatabricksSparkPythonActivity
from adfpy.pipeline import AdfPipeline

extract = AdfCopyActivity(
    name="copyBlobToBlob",
    input_dataset_name="staging",
    output_dataset_name="landing",
    source_type=BlobSource,
    sink_type=BlobSink,
)

ingest = AdfDatabricksSparkPythonActivity(name="ingest", python_file="foo.py")

extract >> ingest

pipeline = AdfPipeline(name="copyPipeline", activities=[extract, ingest], schedule="* * 5 * *")

There are a few noteworthy things here: 1. All adfPy resources are prefixed with Adf. Other than that, the aim is to keep the naming as much in line with ADF as possible to ensure it's clear what kind of Activity will be created in ADF when you deploy. 2. Dependencies between activities can be set using the bitshift operators (>> and <<). They work exactly the same as they do in Airflow. 3. As in Airflow, you can define the execution schedule as an attribute of the Pipeline.

This code is available as an example in the examples directory in the adfPy repository along with more complex cases.