Orchestrate
.procfwk

Logo

A cross tenant metadata driven processing framework for Azure Data Factory and Azure Synapse Analytics achieved by coupling orchestration pipelines with a SQL database and a set of Azure Functions.


- Overview
- Contents


View the Project on GitHub mrpaulandrew/procfwk

Pipelines


¬ę¬†Contents / Data Factory


Grandparent - Environment Setup

Grandparent Pipeline Role: Optional level platform setup, for example, scale up/out compute services ready for the framework to run.

The grandparent level within the processing framework solution is completely optional. It is expected that here higher level platform operations are performed to make the environment ready before the core framework is triggered. This maybe include a set of data ingestion processes.


Parent - Framework Executor

Parent Pipeline Role: Execution run wrapper and execution stage iterator.

The parent pipeline is primarily used to setup and/or cleanup the next execution run for the framework, depending on the current database Properties. It runs precursor operations and resets the metadata where required in either new or restart scenarios.

As a secondary function at this level the parent pipeline initiates the first ForEach activity used to sequentially iterate over Execution Stages. For each iteration the framework will also check for any workers that may have blocked processing due to worker pipeline failures.

Finally, the parent is responsible for getting/setting of the Local Execution ID which is then used throughout all downstream pipelines when making updates to the metadata current execution table.


Child - Stage Executor

Child Pipeline Role: Scale out triggering of worker pipelines within the execution stage.

The child pipeline is called once per execution stage. It is small in structure and has a simple purpose to hit the second level ForEach activity, this is used to trigger all worker pipelines within the current execution stage in parallel. This Scale Out Processing is achieved using the default behaviour for the Data Factory activity.


Infant - Worker Executor

Infant Pipeline Role: Worker executor, monitor and reporting of the outcome for the single worker pipeline.

Once a worker pipeline has been triggered by the child one infant pipeline per worker is used to handle the execution and monitoring of its run. The infant uses an Until activity to iterate over the status of its given worker pipeline waiting until it completes. Once complete the infant will update the metadata with the relevant status information and error details in the event of a worker failure.

The time between infant check status iterations can be configured via the database Properties table.

The infant (ironically) is the largest pipeline and has the most activities within the framework, doing the most work. This is to consistency and controllably deal with the execution of the worker pipelines. From a boiler plate code perspective, it is also the pipeline that gets reused/called the most.

The infant pipeline could be used in isolation in trigger a worker pipeline if required.


Worker

Worker Pipeline Role: Anything specific to the process needing to be performed.

Worker pipeline internals fall outside the remit of the processing framework. They exist only as items registered within the metadata plus associated parameters (if required). Worker pipelines are expected to contain whatever activities are required for a given process and should not use content from the processing framework metadata database other than the pipeline level parameters provided at runtime.

By design worker pipelines can be decoupled from the main orchestration pipelines above and live in separate Data Factory resources.