Before detailing the different approaches within this code project for handling service principal values its important to define how the processing frameworking can optionally use Azure Key Vault. Specifically Key Vault can add an extra layer of security to this solution in the following two ways:
Handling credentials used by Data Factory to authenticate against the metadata SQL database and Functions Apps required by the processing framework for normal operations. This is done as part of the Linked Service connections within Data Factory.
Storing the service principal credentials (Application Id and Secret) required by the framework to interact with worker pipelines and the target Data Factory instance where those worker pipelines reside.
The following content within this page only focuses on the second use case (role) for Azure Key Vault within the context of the processing framework and calling worker pipelines.
The processing framework supports the ability to use a different set of credentials for the execution of every single worker pipeline. This also includes the ability to call worker pipelines in different Azure tenants and subscriptions, as well as different Data Factory instances.
To make this possible each Azure Function used within the framework execution is given a set of service principal (SPN) details at runtime and is responsbile for instantiating and authenticating with its own Data Factory client, provided by the function helper classes. Once the management client connection is made using the .Net SDK the pipeline classes/methods are called to interact with the worker pipelines.
Given this understanding in the orchestration pipelines, the authentication details required by a given worker pipeline are provided from the metadata database using the infant pipeline activities. In each case, the database table [dbo].[ServicePrincipals] is used to store the SPN information and joined to the worker pipeline information via a link table.
Depending on the framework configuration the [dbo].[ServicePrincipals] will contain either:
The different methods of handling SPN details within the processing framework is configured using the database properties table. The property used is called SPNHandlingMethod and can have one of the following values. These values corrospond to the behaviour points above.
In both case the configuration aware helper stored procedure [procfwkHelpers].[AddServicePrincipalWrapper] can be used to add your SPN details to the metadata and linked to a worker pipeline.
EXEC [procfwkHelpers].[AddServicePrincipalWrapper] @DataFactory = N'FrameworkFactory', @PrincipalIdValue = '$(CLIENT_ID_or_kvURL)', @PrincipalSecretValue = '$(CLIENT_SECRET_or_kvURL)', @PrincipalName = '$(CLIENT_NAME)';
The database provides authentication details to the Azure Functions when the infant pipline activity gets the worker authentication information, this includes the tenant and subscription Id values. Where applicable these values are deycrypted by the stored procedure [procfwk].[GetWorkerAuthDetails] at runtime and added to the various function request bodys via a pipeline variable.
The worker pipeline authentication details are requested and returned from the database once per infant pipeline.
The database provides authentication details to the Azure Functions when the infant pipline activity gets the worker authentication information. However, the App Id and App Secret are Key Vault URL’s rather than the actual decryted values.
The function recongises a URL has been provided in the request body using the internal helper methods, instantiates its own Key Vault client authenticating using the Function App Managed Service Identity (MSI). Then queries Key Vault using the URL to return the secret values.
Once the Key Vault URL’s have been resolved to values, the Data Factory client is established.
If using this approach for handling SPN details the Function App MSI needs adding to the Key Vault access policy within your environment.
Currently the processing framework only supports one type of SPN handling for all worker pipelines. If hybrid SPN handling is required where by some values use a Key Vault and some workers have values stored directly in the database please raise a new feature request.