When deploying Azure resources always check with the person paying the bill!
In the case of the processing framework, the intention is to keep resources running as cheaply as possible. That said, the following is expected in terms of service tiers.
|Data Factory||v2||Data Flow activities are not used as part of the processing framework so the default auto resolving Azure Integration Runtime can be used.|
|Synapse||N/A||Data Flow activities are not used as part of the processing framework so the default auto resolving Azure Integration Runtime can be used.|
|SQL Database||S2||Using a provisioned service tier rather than serverless is recommended to avoid framework start up failures.|
|Functions App||Consumption Plan||Deployments done using code to a Windows host wit support for .Net Core 3.1|
The service limitations for the processing framework are inherited from Microsoft’s Azure Resource limitations. For the service tiers described above the first resource limitation you’ll likely hit will be for the orchestrator and the allowed number of pipeline activity runs per subscription and IR region.
|Limit Detail||Default Limit||Maximum Limit|
|Concurrent External activity runs per subscription per Azure Integration Runtime region. External activities are managed on integration runtime but execute on linked services, including Databricks, stored procedure, HDInsights, Web, and others. This limit does not apply to Self-hosted IR.||3,000||3,000|
|Concurrent Pipeline activity runs per subscription per Azure Integration Runtime region. Pipeline activities execute on integration runtime, including Lookup, GetMetadata, and Delete. This limit does not apply to Self-hosted IR.||1,000||1,000|
Within the framework defining internal and external activities can be difficult as this greatly depends on your worker pipelines and how long the infant pipeline is going to wait between complete status checks.
However, for a single execution stage, in a single batch, triggered from the parent pipeline using the default linked service/intergration runtime setup and with with 50 concurrent worker pipeline. This will generate approximatley 700 activity runs within your framework Data Factoy instance. Using a 30 second infant pipeline wait duration.
At the point of testing, no other orchestrator instances were present on an isolated Azure Subscription and worker pipelines were running in a different Azure tenant.
This gives a theorical maximum degree of parallelism of 285 worker pipelines.
It is possible to push these service limitations by creating a custom set of Azure Integration Runtimes for the orchestrator and tuning specific parts of the framework pipelines to reduce the number of internal activity calls. If you hit activity run failures during a framework execution run and this is isolated within a subscription please email firstname.lastname@example.org for support. To that end, stress testing of the framework has been done using 20 concurrent execution batches, each calling 50 worker pipelines to achieve a maximum degree of parallelism of 1000 worker pipelines. This was done under ideal conditions and using a custom configuration of 4x Azure Integration Runtimes, in different Azure Regions.