Skip to content
OObaro.Olori
All articles
Azure DevOps

Locking down Azure Pipelines to Azure with Workload Identity Federation, no service principal secrets anywhere

The pipeline ran for three years on a service principal client secret in an Azure DevOps Service Connection. Then someone pasted the connection's diagnostic dump into a Slack thread and the secret was viewable to 800 people for eleven minutes. Eleven months later, the same six subscriptions deploy with zero long-lived credentials. Here is the whole rebuild, top to bottom.

16 min read 250 viewsAzure DevOpsAzure PipelinesWorkload Identity FederationEntra ID

The pipeline had been working for three years. Standard shape: an Azure DevOps Service Connection of type "Azure Resource Manager" using a service principal client secret, the secret rotated quarterly via a calendar reminder that someone always forgot, and the connection authorising every Bicep deploy across six subscriptions. On a Tuesday morning in March, someone trying to debug a flaky deploy pasted the connection's diagnostic dump into a Slack thread. The dump included the client secret. Slack indexed it. The secret was viewable to about 800 people for the eleven minutes it took the on-call engineer to spot it and rotate. Eleven minutes of exposure was apparently within tolerance for the security team; they wrote a finding anyway. Two months later the same engineer pasted a different secret in a different channel. The finding became a hard ask: get rid of the secrets.

We did. Eleven months later, the same six subscriptions are being deployed to with no long-lived credentials anywhere in Azure DevOps. The Service Connections use Workload Identity Federation. Pipeline runs mint short-lived tokens. The audit conversation now takes ninety seconds instead of the previous half-hour. Nothing has been rotated since because there is nothing left to rotate.

This is the full setup, from the Entra app registration to the Service Connection JSON to the pipeline YAML. Forty-three pipelines across the org are running on this pattern. Zero credential incidents in eleven months.

Why workload identity federation, briefly

For readers who have not seen the term: workload identity federation is the protocol by which an external identity provider (in our case Azure DevOps) mints an OIDC token, and Microsoft Entra ID exchanges that token for an Azure access token, with no shared secret between the two systems. The trust is anchored on a subject claim that uniquely identifies the calling workflow run, specifically the org / project / service connection it ran under.

Three properties matter for security. First, the token Entra mints back is short-lived (15 minutes). Second, the OIDC token from Azure DevOps is itself scoped to a specific pipeline run, so reusing it elsewhere does not work. Third, there is nothing on the Azure DevOps side to leak; if your pipeline's run logs end up in a Slack channel, the worst damage is "someone learns your subscription id," which is roughly equivalent to learning your account number, useless on its own.

The same protocol underpins GitHub Actions OIDC and the various third-party CI integrations Entra ID supports. The Azure DevOps version of the flow is documented on Microsoft Learn but that page assumes you are starting fresh. The interesting part of the work is in the migration, not the bootstrap.

What you will have at the end

Three things, versioned in code:

  1. An Entra app registration with a federated credential per Service Connection
  2. Role assignments on the right scope, granted to the app's service principal
  3. An Azure DevOps Service Connection of type azurerm with creationMode: Manual and scheme: WorkloadIdentityFederation

And on the operational side:

  • Pipelines that authenticate to Azure with no secrets in Service Connection storage and no PATs in variable groups
  • A nightly job that audits Service Connections in every project and flags any still using ServicePrincipalKey auth
  • A runbook the platform team uses to vend new connections, end to end, in roughly nine minutes

Scale at the point of writing: 43 Service Connections across 18 Azure DevOps projects, each binding to a different scope in Azure, all running through the same template.

Prerequisites

Run each one and confirm:

az --version         # 2.65 or newer
az extension show --name azure-devops --query version  # 1.0.1 or newer

Required permissions before you start:

  • Application Administrator in Entra ID, or Cloud Application Administrator (so you can create app registrations and federated credentials)
  • Owner at the target scope you want the pipeline to deploy to (so you can grant the SP a role assignment there)
  • Project Collection Administrator in the Azure DevOps org, or at least Service Connection Administrator on the project (so you can create the Service Connection)

A note on the Service Connection type. The classic option is "Azure Resource Manager" with Service principal (manual). The newer option, which is what this guide uses, is also "Azure Resource Manager" but with Service principal (manual) and the auth scheme switched to Workload Identity Federation. The portal flow does not always expose this; the REST API does. The pipeline YAML cares about the result, not how you created it.

Step 1: Register the Entra app

One app registration per Service Connection is the right granularity. The temptation is to share one app across many connections, which works on day one and becomes the worst thing in your tenant by year two when you cannot tell which pipeline is doing what.

APP_NAME="ado-platform-prod-eus2"
APP_ID=$(az ad app create --display-name "$APP_NAME" --query appId -o tsv)
SP_OBJECT_ID=$(az ad sp create --id "$APP_ID" --query id -o tsv)

echo "app id (client id): $APP_ID"
echo "sp object id:       $SP_OBJECT_ID"

The app does not need a redirect URI for this flow. It does not need any API permissions. It is a pure SP that exists to receive a federated token exchange and hold role assignments.

Grant the SP a role at the smallest scope you can get away with. For a pipeline that only deploys into one resource group, Contributor on that resource group is right. For a vending pipeline that creates subscriptions, Owner at the management group is what it needs. The principle is: scope as tight as the work demands, never wider.

SUB_ID="11111111-2222-3333-4444-555555555555"
RG_NAME="rg-platform-prod-eus2"

az role assignment create \
  --assignee "$APP_ID" \
  --role "Contributor" \
  --scope "/subscriptions/$SUB_ID/resourceGroups/$RG_NAME"

If the pipeline needs to read state from a Terraform backend that lives in a different subscription, add a second role assignment scoped to that backend's resource group. Each assignment is one line. The aggregate of all the role assignments is the pipeline's effective permission set.

Step 2: Create the Service Connection (manual mode)

This is the part most teams skip past. The portal has a "create automatically" mode that does the app registration and the federated credential and the Service Connection in one click. It works, but you end up with an app registration in the wrong place in your tenant, named something you did not pick, with role assignments that are too wide because the wizard cannot infer the right scope. Manual mode gives you all of that under your own control.

The Service Connection is created via the Azure DevOps REST API. The endpoint is POST https://dev.azure.com/{org}/_apis/serviceendpoint/endpoints?api-version=7.1-preview.4.

ORG="contoso"
PROJECT_ID="b3f47e91-6c8a-4d12-9a4e-1b9d3e9f4dc1"
SUB_NAME="Platform Prod EUS2"

cat > connection.json <<EOF
{
  "name": "sc-${APP_NAME}",
  "type": "azurerm",
  "url": "https://management.azure.com/",
  "authorization": {
    "scheme": "WorkloadIdentityFederation",
    "parameters": {
      "tenantid": "$(az account show --query tenantId -o tsv)",
      "serviceprincipalid": "${APP_ID}",
      "scope": "/subscriptions/${SUB_ID}"
    }
  },
  "data": {
    "environment": "AzureCloud",
    "scopeLevel": "Subscription",
    "subscriptionId": "${SUB_ID}",
    "subscriptionName": "${SUB_NAME}",
    "creationMode": "Manual"
  },
  "isShared": false,
  "isReady": true,
  "serviceEndpointProjectReferences": [
    {
      "projectReference": {
        "id": "${PROJECT_ID}",
        "name": "platform"
      },
      "name": "sc-${APP_NAME}"
    }
  ]
}
EOF

az rest \
  --method post \
  --uri "https://dev.azure.com/${ORG}/_apis/serviceendpoint/endpoints?api-version=7.1-preview.4" \
  --headers "Content-Type=application/json" \
  --body @connection.json \
  > created-connection.json

SC_ID=$(jq -r '.id' created-connection.json)
ISSUER=$(jq -r '.authorization.parameters.workloadIdentityFederationIssuer' created-connection.json)
SUBJECT=$(jq -r '.authorization.parameters.workloadIdentityFederationSubject' created-connection.json)

echo "Service Connection id:    ${SC_ID}"
echo "ADO-issued issuer URL:    ${ISSUER}"
echo "ADO-issued subject claim: ${SUBJECT}"

The two fields that matter, that you will plug into the federated credential in Step 3, are the issuer (something like https://vstoken.dev.azure.com/{ado_organisation_id}) and the subject (a string of the shape sc://{org}/{project}/{connection-name}). Azure DevOps generates these on connection creation. They are derivable, but the safer move is to read them back from the connection rather than reconstruct them.

The creationMode: "Manual" field is the bit that tells Azure DevOps not to create its own app registration. Without that, the API call attempts to auto-create one, which is the path we are avoiding because we want to control the app's name, location, and role assignments ourselves.

Step 3: Add the federated credential on the Entra app

This is where the trust between Azure DevOps and Entra ID is anchored. The federated credential says, in effect: when a token signed by {issuer} arrives with subject {subject}, treat it as authentication for this app.

cat > federated-credential.json <<EOF
{
  "name": "ado-fed-cred",
  "issuer": "${ISSUER}",
  "subject": "${SUBJECT}",
  "audiences": ["api://AzureADTokenExchange"]
}
EOF

az ad app federated-credential create \
  --id "${APP_ID}" \
  --parameters federated-credential.json

The audiences value is fixed at api://AzureADTokenExchange. Do not change it. Every example you find online uses this exact string because Entra ID looks for that exact string in the token's aud claim. The string is not configurable on the Entra side; if your token has a different audience, the exchange fails with AADSTS70021.

The subject is exact match. The string sc://contoso/platform/sc-ado-platform-prod-eus2 does not match sc://contoso/Platform/sc-ado-platform-prod-eus2 (capitalisation matters) and does not match a wildcard. If you rename the Service Connection later, you must update the federated credential. We learned that the messy way; the runbook now includes a step that re-reads the issuer and subject after any connection edit.

Step 4: The pipeline

Once Steps 1 through 3 are done, the pipeline is small. The Service Connection name is the only piece that varies; the authentication mechanics happen inside AzureCLI@2 and AzurePowerShell@5.

trigger:
  branches:
    include: [main]

pool:
  vmImage: ubuntu-latest

variables:
  serviceConnection: 'sc-ado-platform-prod-eus2'

stages:
  - stage: Plan
    displayName: 'Bicep what-if'
    jobs:
      - job: WhatIf
        steps:
          - checkout: self
          - task: AzureCLI@2
            displayName: 'az deployment what-if'
            inputs:
              azureSubscription: $(serviceConnection)
              scriptType: bash
              scriptLocation: inlineScript
              inlineScript: |
                az deployment group what-if \
                  --resource-group rg-platform-prod-eus2 \
                  --template-file ./infra/main.bicep \
                  --parameters @./infra/prod.bicepparam \
                  -o table

  - stage: Apply
    dependsOn: Plan
    displayName: 'Deploy to prod'
    jobs:
      - deployment: ApplyBicep
        environment: prod
        strategy:
          runOnce:
            deploy:
              steps:
                - checkout: self
                - task: AzureCLI@2
                  displayName: 'az deployment group create'
                  inputs:
                    azureSubscription: $(serviceConnection)
                    scriptType: bash
                    scriptLocation: inlineScript
                    inlineScript: |
                      az deployment group create \
                        --resource-group rg-platform-prod-eus2 \
                        --template-file ./infra/main.bicep \
                        --parameters @./infra/prod.bicepparam

Four things that look small but matter.

azureSubscription: $(serviceConnection) is how the task is told which Service Connection to use. The variable indirection is so the same pipeline file can be reused across environments with different connection names, set via variable group or template parameter.

The environment: prod on the deployment job is what triggers Azure DevOps environment approval gates. If you have configured your prod environment to require approval from named reviewers, this is where it kicks in. Workload identity federation does not bypass approval gates; it only changes what authenticates, not who approves.

Inside the inline script, az commands run as the authenticated service principal automatically. You do not need az login because the AzureCLI@2 task has already done it using the federated token. Tasks of type AzurePowerShell@5, AzureFunctionApp@2, AzureWebApp@1, and the rest of the Microsoft-supplied tasks behave the same way: they pick up the connection's authentication and execute under that identity.

There is no client secret anywhere in this file. Searching the run log for client_secret returns zero hits. That is the property you want.

Step 5: Multi-environment patterns

Production should require explicit approval; non-production should not. The pattern we use is one Service Connection per environment per project, each bound to a different federated credential subject.

# Dev: federated cred lets any pipeline in this project's dev path authenticate
SUBJECT_DEV="sc://contoso/platform/sc-ado-platform-dev-eus2"

# Prod: same shape, but the Service Connection itself is gated by ADO environment approval
SUBJECT_PROD="sc://contoso/platform/sc-ado-platform-prod-eus2"

The subject claim is project-scoped and connection-scoped. A pipeline running under the dev connection cannot use the prod connection's token; the issuer-subject pair does not match. This means a development pipeline cannot, by mistake, deploy into production, because the underlying authentication will fail before the deploy command even runs.

For pull-request runs that should be allowed to plan but not apply, we use a separate Service Connection scoped to Reader on the target resource group and bound to a PR-specific subject. The PR pipeline uses that connection; the merge-to-main pipeline uses the writer connection. Same Entra app pattern, different role assignments, different federated credential subjects.

# Two connections from the same overall pattern:
#   sc-ado-platform-prod-eus2          → Contributor on the RG, federated subject for main
#   sc-ado-platform-prod-eus2-readonly → Reader on the RG, federated subject for PR runs

Migration from existing secret-based connections

The pattern we used to retire 43 existing Service Connections without a feature freeze:

  1. For each existing connection, identify its current SP and the role assignments on it. We exported these via az role assignment list --assignee $OLD_SP_ID --all.
  2. Create a new app registration for the new connection, alongside the old one. Both live in Entra simultaneously.
  3. Apply the same role assignments to the new app's SP. At this point, both apps can deploy into the same scope. Nothing has changed for the running pipeline.
  4. Create the new Service Connection in workload identity federation mode (Steps 1 through 3 above).
  5. Update the pipeline to reference the new connection name. Merge to main. The pipeline now runs through the new connection.
  6. Wait one week of normal deploy activity through the new connection.
  7. Delete the old Service Connection, then revoke the role assignments on the old SP, then delete the old app registration.

The week between Step 5 and Step 7 is the safety margin. If something unexpected fails on the new connection, the old one is still there and a five-minute revert in the pipeline file puts you back on it. We hit that revert exactly once across 43 migrations; the cause was a renamed pipeline variable, not an authentication problem, but the revert was useful for the day it took to diagnose.

Across the full migration, total time per connection: roughly 18 minutes of platform-engineer time, end to end. With 43 connections, that is about 13 hours of focused work. We spread it across four weeks, doing ten or so per week, to leave room for the soak.

Audit and rotation

There is, deliberately, nothing to rotate. The federated credentials on the Entra app do not expire. The Azure DevOps OIDC tokens are minted fresh per pipeline run and expire in 15 minutes naturally.

The audit angle is the one the security team cares about. Their question used to be "show me the rotation history of the SP secret powering this Service Connection." That question no longer applies because there is no secret. The new question is "show me which Service Connections use workload identity federation and which still use a key." We answer that with a daily script that calls the Azure DevOps REST API across every project in the org:

az devops project list --org "https://dev.azure.com/${ORG}" --query 'value[].id' -o tsv | \
while read PROJECT_ID; do
  az rest \
    --method get \
    --uri "https://dev.azure.com/${ORG}/${PROJECT_ID}/_apis/serviceendpoint/endpoints?type=azurerm&api-version=7.1-preview.4" \
    --query "value[].{name:name,scheme:authorization.scheme,createdOn:createdOn}" \
    -o json
done | jq -s 'flatten | map(select(.scheme != "WorkloadIdentityFederation"))'

If the output is non-empty, we have a Service Connection somewhere that has slipped back to secret-based auth, which is almost always because someone copied an older connection while making a new one. We migrate it on the same day the script flags it.

Troubleshooting

AADSTS70021: No matching federated identity record found is the canonical "your subject claim is wrong" error. It is almost never about the federated credential not existing; it is about the credential's subject not matching the pipeline-run subject exactly. Compare them character by character (capitalisation is the usual culprit).

AADSTS700024: Client assertion is not within its valid time range means the OIDC token's issued-at time and the wall-clock time on Entra's side have drifted apart. On hosted agents this almost never happens; on self-hosted agents with a misconfigured NTP this happens regularly. Fix the agent's clock.

Forbidden: The client 'XXX' with object id 'YYY' does not have authorization means the federated credential matched (great) and the role assignment is missing (less great). Add the right role at the right scope to the app's SP via az role assignment create.

ServiceConnectionId is malformed or the connection does not exist from inside an AzureCLI@2 task usually means the connection name in azureSubscription: does not match a connection visible to the project. Either the connection was created in a different project, or it has not yet been shared with this project. Share it via the Service Connection's "Security" tab.

Pipeline run is unable to access Service Connection when you swear you set it up correctly is, in our experience, eight times out of ten an environment approval gate that the run has not satisfied yet. The pipeline sits at "Waiting for approval" and the connection cannot be acquired until the approval lands.

What changed in the audit conversation, and in the team

Before the migration, the security team's monthly review of our Service Connection inventory took roughly 35 minutes. They picked five connections at random, asked "when was the secret last rotated, who has access, why does this app have Contributor on this subscription instead of a tighter scope." We answered, but most of the answers were either "we don't know" or "the calendar reminder was for May, here is the screenshot." The whole exchange existed because the secret existed; the secret created the audit surface.

After the migration, the same monthly review is ninety seconds. They run the script above, see zero connections on key-based auth, glance at the output, move on. There is no rotation log to inspect because there is no rotation. There is no shared secret to ask about because there are no shared secrets. The conversation moved from "show me your evidence of hygiene" to "show me the trust chain," which is one URL and one paragraph in our wiki.

The team change is the part I underweighted at the start. With secrets, every Service Connection had a small but nonzero operational tax: someone owns the rotation, someone tests after the rotation, someone fields the "the pipeline broke after rotation" incident the morning after. Without secrets, that whole cycle vanishes. Forty three connections by four rotations a year by roughly two hours of attention each, that is about 340 person-hours a year we no longer spend. The platform team used those hours to build the connection vending pipeline (a separate write-up). The vending pipeline, in turn, makes adding the forty fourth Service Connection a nine-minute operation rather than the previous half-day. The improvements compound in the way well-chosen platform investments do.

The 800 people who saw the leaked secret in March, by the way, never saw a different one. Eleven months in, the conversation about credential exposure has not had to happen again. That is the actual return on the work.