Skip to content
OObaro.Olori
All articles
Azure DevOps

Self-hosted Azure DevOps agents on AKS with KEDA: queues from 9 minutes to 23 seconds

The May invoice for Microsoft-hosted parallel jobs was £4,180 and the queue at 10am was eleven deep. Here is how I moved every build pool onto AKS, autoscaled with KEDA, and watched the cost graph bend.

18 min read 421 viewsAzure DevOpsAKSKEDAself-hosted agents

The invoice for May 2024 was £4,180. That number is for Microsoft-hosted parallel jobs alone, sitting on a line item called "Azure Pipelines hosted parallel jobs" against an Azure DevOps org running 27 active pipelines, twelve of which fan out into matrix builds. The same month, our median queue time across all pipelines was 9 minutes 14 seconds. The 95th percentile was over 22 minutes. Two of the team's pipelines time out their first Docker build step roughly one in nine runs because Docker Hub rate-limits the unauthenticated pulls that a freshly minted hosted agent always starts with. None of the agents have a warm npm or Maven cache; everything is pulled from scratch on every run. There is no GPU option at all on the Linux hosted pool, which is a problem for one team trying to validate CUDA kernels in CI.

By November we had moved every pipeline that was tolerant of self-hosted infrastructure (39 of the 47 we operate, the rest stayed hosted on purpose) onto an AKS cluster running self-hosted agents through KEDA's azure-pipelines scaler. Median queue time dropped to 23 seconds. The 95th percentile is now 1 minute 41 seconds, and the worst case I can find in the logs from the last three weeks is 4 minutes 06 seconds, which corresponds to a Sunday morning when our spot capacity was reclaimed and KEDA had to fall back to the regular node pool. Cost on the same line item went to £1,210 a month, which includes the AKS node compute. The story below is the actual migration, not the marketing version.

Why hosted agents got expensive

Two reasons that compound. First, hosted parallel jobs are billed per parallel slot, not per minute of work. You buy a "parallel job," you get one queue lane, and a hot slot is the same price as an idle one. We had ten paid lanes on top of the one free Microsoft-grant lane, because the queue depth chart over a normal Tuesday afternoon would routinely hit 11 or 12 jobs waiting. Ten paid lanes is £316 per lane per month at the rate we were billed, so the lanes alone were £3,160. The remaining £1,020 was the larger CI/CD lanes the platform team used for release pipelines that needed bigger VMs.

Second, every hosted job spins a fresh VM. The agent comes up cold. The first docker pull node:20 against Docker Hub from a new IP is fine; the eighty-third pull that hour from the same Azure-owned IP range is rate-limited. We chased that error for a week before realising it was structural, not a bug in our pipelines. The mitigation was an ACR mirror of Docker Hub, which fixed the rate-limit but did nothing for the cold start of every other tool. Each fresh agent installs gh, pulls our Bicep templates, restores nuget packages, and bootstraps a 1.3 GB Python venv. The wall-clock on that bootstrap averaged 2 minutes 41 seconds. Across 600 builds a day, that bootstrap alone burned roughly 27 hours of paid compute we never used for actual work.

The decision to move came not because of a single incident, it came because the cost graph crossed a line the finance team had drawn at £4,000 and an automated email landed in two inboxes. Self-hosted Linux agents are documented on Microsoft Learn but that page assumes a single VM that runs forever. KEDA's Azure Pipelines scaler, running on AKS, is what closes the gap.

What the destination looks like

Three things in place, all in infra/aks-agents/:

  1. An AKS cluster with two node pools. The default system pool runs the KEDA operator and a few cluster add-ons. A user pool called agents runs the actual job pods on Standard_D8s_v5 spot nodes with a fallback regular pool of two nodes for capacity guarantees.
  2. A container image, acr.azurecr.io/devops/agent:2.227.1-ubuntu2204, that holds the Azure Pipelines agent tarball plus the toolchain we actually need (Node 20, Python 3.12, az CLI, kubectl, helm, Bicep, docker buildx). It is built nightly off a multi-stage Dockerfile.
  3. A KEDA ScaledJob per agent pool. We run three pools (aks-linux, aks-linux-gpu, aks-linux-large), each backed by its own ScaledJob watching its own Azure DevOps pool ID.

On the operational side, agents register on pod start, run exactly one job, then exit. Pods are ephemeral. There is no PVC for _work; we tried it, removed it, the reasoning is below. The scaler authenticates to Azure DevOps via workload identity federation, so there is no PAT sitting in a Kubernetes secret. There is one PAT, in a sealed secret, used only for the initial agent registration and rotated every 30 days by a CronJob.

The agent container

Start from ubuntu:22.04 and add the agent. The official guidance is to build everything yourself, which I agree with; the Microsoft-published image is not bad but it does not contain the tools we need, and adding them on top means a fat image you cannot reason about.

FROM ubuntu:22.04 AS base

ENV DEBIAN_FRONTEND=noninteractive
ENV TARGETARCH=amd64
ENV AGENT_VERSION=3.232.3

RUN apt-get update && apt-get install -y --no-install-recommends \
      ca-certificates \
      curl \
      git \
      gnupg \
      iputils-ping \
      jq \
      libicu70 \
      libkrb5-3 \
      libssl3 \
      lsb-release \
      software-properties-common \
      sudo \
      tar \
      unzip \
      zip \
    && rm -rf /var/lib/apt/lists/*

RUN curl -sL https://aka.ms/InstallAzureCLIDeb | bash \
    && curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
    && apt-get install -y --no-install-recommends nodejs python3.12 python3-pip \
    && rm -rf /var/lib/apt/lists/*

RUN curl -fsSL https://get.docker.com | sh \
    && curl -fsSLo /usr/local/bin/kubectl \
         https://dl.k8s.io/release/v1.30.3/bin/linux/${TARGETARCH}/kubectl \
    && chmod +x /usr/local/bin/kubectl

WORKDIR /azp/agent

RUN curl -fsSL -o agent.tar.gz \
      https://vstsagentpackage.azureedge.net/agent/${AGENT_VERSION}/vsts-agent-linux-x64-${AGENT_VERSION}.tar.gz \
    && tar zxvf agent.tar.gz \
    && rm agent.tar.gz \
    && ./bin/installdependencies.sh

RUN useradd -m -s /bin/bash agent \
    && chown -R agent:agent /azp

COPY start.sh /azp/start.sh
RUN chmod +x /azp/start.sh

USER agent
ENTRYPOINT ["/azp/start.sh"]

The installdependencies.sh script ships with the agent and pulls OS packages it needs at runtime; running it at build time means cold pod start does not pay for it. The user is non-root, which the agent requires anyway. The Docker CLI is in the image; we do not run Docker-in-Docker, we mount the host's docker socket via a hostPath in the pod spec. We pin the agent version because a silent bump once introduced a node10 deprecation warning that broke a task one team used and we lost half a day to it.

The start script

This is the bit that registers the agent against the org, points it at a pool, and runs exactly one job. The --once flag is the entire reason this whole architecture works.

#!/bin/bash
set -euo pipefail

if [ -z "${AZP_URL:-}" ]; then
  echo "AZP_URL is required" >&2
  exit 1
fi

if [ -z "${AZP_TOKEN_FILE:-}" ]; then
  if [ -z "${AZP_TOKEN:-}" ]; then
    echo "AZP_TOKEN or AZP_TOKEN_FILE is required" >&2
    exit 1
  fi
  AZP_TOKEN_FILE=/azp/.token
  echo -n "${AZP_TOKEN}" > "${AZP_TOKEN_FILE}"
fi

unset AZP_TOKEN

cd /azp/agent

cleanup() {
  if [ -e config.sh ]; then
    ./config.sh remove --unattended \
      --auth pat \
      --token "$(cat "${AZP_TOKEN_FILE}")" || true
  fi
}

trap 'cleanup; exit 130' INT
trap 'cleanup; exit 143' TERM

./config.sh --unattended \
  --agent "${AZP_AGENT_NAME:-$(hostname)}" \
  --url "${AZP_URL}" \
  --auth pat \
  --token "$(cat "${AZP_TOKEN_FILE}")" \
  --pool "${AZP_POOL:-aks-linux}" \
  --work "${AZP_WORK:-_work}" \
  --replace \
  --acceptTeeEula

./run.sh --once

AZP_AGENT_NAME defaults to the pod's hostname, which Kubernetes sets to the pod name. KEDA's ScaledJob creates pod names like aks-linux-agent-9c4f2-bxq7p, so every agent registering against the pool has a unique name. The first version of this script used a static name and the agents collided; the second registration would refuse with The agent name is already in use. The --replace flag combined with a unique-per-pod name is the fix.

./run.sh --once is the heart of the thing. The agent picks up exactly one job from the pool, runs it to completion (or failure), then exits. KEDA sees the pod terminate and removes the job object after successfulJobsHistoryLimit is exceeded. The entire lifecycle of an agent is one pod that exists for the duration of one pipeline job.

The cleanup trap deregisters the agent on SIGTERM or SIGINT, which matters when a spot node is reclaimed mid-job. Without it, Azure DevOps holds the registration for ten minutes assuming the agent will come back, and the pool shows a phantom offline agent.

The KEDA ScaledJob

Once the agent image and start script are in place, the only remaining piece is telling KEDA to create one job per queued build. This is the ScaledJob resource we run in production, slightly simplified.

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: azp-agent-linux
  namespace: agents
spec:
  jobTargetRef:
    parallelism: 1
    completions: 1
    backoffLimit: 0
    template:
      metadata:
        labels:
          azp-pool: aks-linux
      spec:
        serviceAccountName: azp-agent
        terminationGracePeriodSeconds: 30
        tolerations:
          - key: "agents"
            operator: "Equal"
            value: "true"
            effect: "NoSchedule"
          - key: "kubernetes.azure.com/scalesetpriority"
            operator: "Equal"
            value: "spot"
            effect: "NoSchedule"
        nodeSelector:
          agentpool: agents
        containers:
          - name: agent
            image: acr.azurecr.io/devops/agent:2.227.1-ubuntu2204
            imagePullPolicy: IfNotPresent
            env:
              - name: AZP_URL
                value: "https://dev.azure.com/contoso"
              - name: AZP_POOL
                value: "aks-linux"
              - name: AZP_TOKEN_FILE
                value: "/azp/secrets/token"
            volumeMounts:
              - name: azp-token
                mountPath: /azp/secrets
                readOnly: true
              - name: docker-sock
                mountPath: /var/run/docker.sock
            resources:
              requests:
                cpu: "1500m"
                memory: "4Gi"
              limits:
                cpu: "4000m"
                memory: "8Gi"
        volumes:
          - name: azp-token
            secret:
              secretName: azp-registration-token
          - name: docker-sock
            hostPath:
              path: /var/run/docker.sock
              type: Socket
  pollingInterval: 15
  maxReplicaCount: 30
  minReplicaCount: 0
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  triggers:
    - type: azure-pipelines
      metadata:
        poolName: "aks-linux"
        targetPipelinesQueueLength: "1"
        organizationURLFromEnv: "AZP_ORG_URL"
        personalAccessTokenFromEnv: "AZP_TOKEN"
      authenticationRef:
        name: azp-trigger-auth

pollingInterval: 15 means KEDA queries the Azure DevOps REST API every 15 seconds asking how many jobs are queued against this pool. targetPipelinesQueueLength: "1" means: for every queued job, spawn one agent pod. The scaler's maths is desiredReplicas = ceil(queueLength / targetPipelinesQueueLength). With a target of 1, one queued job creates one pod. A higher target would be wrong for our case because each agent runs --once.

KEDA's actual REST call is GET https://dev.azure.com/{org}/_apis/distributedtask/pools/{poolId}/jobrequests. The pool jobrequests endpoint is the documented surface; the scaler filters by status and counts the ones with no assigned agent. The poolName resolves to a pool ID that KEDA caches.

maxReplicaCount: 30 is our ceiling. We arrived at 30 after measuring the AKS API server's behaviour with 30 concurrent pod creations. Higher numbers caused intermittent scheduling delays of 4-6 seconds, which we did not love.

backoffLimit: 0 is deliberate. The job inside the pod runs --once and exits. Kubernetes' job retry would re-run the agent registration in a new pod, which Azure DevOps will accept but the original pipeline run has already moved on. We tell Kubernetes never to retry; KEDA will create a new job if the queue is still non-empty.

The registration token, and how we got rid of it

The above ScaledJob references a Kubernetes secret called azp-registration-token that holds a PAT. The PAT has scope Agent Pools (Read & manage) and lives for 30 days. The KEDA scaler also needs a token to query the pools API, referenced via authenticationRef: azp-trigger-auth.

The PAT is the obvious operational weak point. Three problems: rotation is manual unless you build automation, the token is visible in any kubectl-empowered debugging session, and a leaked PAT can register an arbitrary agent in the pool that then receives any job that lands.

For rotation, a CronJob runs every 25 days, calls the az devops REST API to mint a new PAT scoped to agent pools, and overwrites the secret. In-flight agents keep using their copy until they exit; new agents pick up the new value because the secret is mounted as a file and re-read on each pod start.

For the KEDA scaler specifically, we moved off PAT entirely. The scaler authenticates via Azure AD workload identity, and KEDA v2.10 onwards has a TriggerAuthentication resource that wires it in. The pattern is: create a user-assigned managed identity, federate it against the AKS OIDC issuer for the KEDA service account, and reference the identity in TriggerAuthentication.

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: azp-trigger-auth
  namespace: agents
spec:
  podIdentity:
    provider: azure-workload
    identityId: "44444444-5555-6666-7777-888888888888"

And on the pod that KEDA runs (the operator, not the agent), the workload identity binding looks like this:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: keda-operator
  namespace: keda
  annotations:
    azure.workload.identity/client-id: "44444444-5555-6666-7777-888888888888"
  labels:
    azure.workload.identity/use: "true"

The federated credential on the managed identity points at the subject system:serviceaccount:keda:keda-operator and the issuer URL is the AKS cluster's OIDC issuer URL, queryable via az aks show -g $RG -n $CLUSTER --query oidcIssuerProfile.issuerUrl -o tsv. After this binding is in place, the KEDA scaler authenticates to Azure DevOps via an Entra-issued token and no PAT is needed for queue inspection.

The agent registration itself still uses a PAT because the ./config.sh command takes one and we did not want to rewrite the agent's bootstrap flow. Reducing from two PATs to one, scoped to one operation, with automatic rotation, was the security review's actual ask. They signed off in February.

The node pool

The AKS user pool that runs agents is a Standard_D8s_v5 (8 vCPU, 32 GB) spot pool with eviction-policy: Delete, max-price: -1, and a min of 0 nodes. It has the taint kubernetes.azure.com/scalesetpriority=spot:NoSchedule so only pods that explicitly tolerate spot end up on it. The agents=true:NoSchedule taint, which the ScaledJob tolerates, keeps cluster add-ons off these nodes entirely.

A secondary fallback regular pool of two Standard_D4s_v5 nodes exists for capacity guarantees. Without it, a spot eviction during a busy hour would queue jobs until new spot capacity returned, which on a Sunday afternoon in West Europe can be 11 minutes of waiting. The two regular nodes cost roughly £180 a month and bound the worst case.

az aks nodepool add \
  --resource-group rg-aks-platform \
  --cluster-name aks-platform-prod \
  --name agentsspot \
  --node-count 0 \
  --min-count 0 \
  --max-count 30 \
  --enable-cluster-autoscaler \
  --priority Spot \
  --eviction-policy Delete \
  --spot-max-price -1 \
  --node-vm-size Standard_D8s_v5 \
  --node-taints "agents=true:NoSchedule,kubernetes.azure.com/scalesetpriority=spot:NoSchedule" \
  --labels "agentpool=agents,workload=ci"

Cluster autoscaler handles node-level scaling; KEDA handles pod-level scaling. End to end on spot capacity, the node-cold loop takes 47 to 90 seconds. On the fallback regular pool it is faster because the nodes are already there. The 23-second p50 queue time only happens when an agent pod can land on an existing node; keeping warm node capacity for the median case would cost more than the queue time difference is worth.

Why we threw away the persistent cache

The first design had a ReadWriteMany PVC mounted at /azp/_work shared across all agents in the pool, backed by Azure Files Premium. The theory was that nuget restores and Maven downloads would hit a warm cache and shave 30 seconds off every build.

After two weeks of running it, the cache helped about 60% of builds and hurt the other 40%. When two agents tried to write to the same _work/_tool directory concurrently (which they would, because parallel builds use the same Node version installer), the second one would block on a file lock or overwrite a half-written file. We saw mysterious EBUSY: resource busy or locked failures in test runs that always corresponded to a concurrent installer write.

Azure Files Premium write latency is around 4-7 ms per syscall, which adds up across the tens of thousands of syscalls a typical nuget restore makes. We measured the build the cache was supposed to help most: a Bicep + .NET monorepo pulling 240 MB of packages. Warm cache: 1 minute 38 seconds. Cold: 1 minute 51 seconds. Thirteen seconds of saving, traded against occasional whole-build failures and a £140-a-month PVC.

We ripped it out. Every job pod gets an empty _work on emptyDir on the node's local SSD. For pipelines where cache really mattered we moved to the built-in Cache@2 task, which stores artefacts in Azure DevOps' own backed storage keyed on a content hash. The same monorepo on Cache@2 builds in 1 minute 22 seconds because the cache content is specific to the lockfile and stored compressed.

A specific gotcha worth repeating

Agent names. The pool name is shared across pods, fine. The agent name within the pool must be unique at any moment but can be reused after deregistration. KEDA scales rapidly during a queue burst; we have seen 14 pods come up in under three seconds. With hostname as the agent name, this works because Kubernetes guarantees unique pod names. We briefly experimented with AZP_AGENT_NAME=aks-linux-agent-${POOL_INDEX} for nicer pool dashboards, which broke spectacularly. Two pods with the same POOL_INDEX (KEDA does not coordinate index numbers across job objects) attempted to register the same name; the second succeeded because of --replace, but the first agent then received 403 Forbidden partway through running its job because Azure DevOps had revoked its credentials.

The fix is in the start script: default to hostname, which Kubernetes makes unique. We added ${POD_NAME} via the downward API for explicitness, but it resolves to the same thing.

env:
  - name: AZP_AGENT_NAME
    valueFrom:
      fieldRef:
        fieldPath: metadata.name

The error log on the first agent at the moment of revocation, for anyone Googling it later: Agent connect error: The HTTP request timed out after 00:01:00. Reconfiguring the agent. and then Could not connect to the server repeating until the pod terminates.

A sample pipeline using the new pool

The pipeline side is small. You change pool: and that is it. Below is a real azure-pipelines.yml from a Bicep-deploying pipeline that now runs on the AKS pool.

trigger:
  branches:
    include: [main]

pool:
  name: 'aks-linux'

variables:
  serviceConnection: 'sc-platform-prod-wif'

stages:
  - stage: Build
    jobs:
      - job: Lint
        steps:
          - checkout: self
          - task: Bash@3
            displayName: 'bicep lint'
            inputs:
              targetType: inline
              script: |
                az bicep build --file infra/main.bicep

  - stage: Deploy
    dependsOn: Build
    jobs:
      - deployment: Apply
        environment: prod
        strategy:
          runOnce:
            deploy:
              steps:
                - checkout: self
                - task: AzureCLI@2
                  inputs:
                    azureSubscription: $(serviceConnection)
                    scriptType: bash
                    scriptLocation: inlineScript
                    inlineScript: |
                      az deployment group create \
                        --resource-group rg-platform-prod-weu \
                        --template-file infra/main.bicep \
                        --parameters @infra/prod.bicepparam

pool: { name: 'aks-linux' } tells Azure DevOps to dispatch the job to that named agent pool. The pool is empty until KEDA creates a pod that registers an agent against it; from the pipeline run's perspective, this is invisible. The job sits in queue state for the seconds it takes the pod to come up, then runs. Switching the same pipeline back to pool: { vmImage: 'ubuntu-latest' } works identically. That portability mattered for the migration because we could move pipelines individually and roll back any one in a single commit.

Troubleshooting log from the first month

KEDA scaler returned error: 401 Unauthorized from https://dev.azure.com/contoso/_apis/distributedtask/pools/41/jobrequests is the most common one we hit early. Cause: the PAT in azp-trigger-auth had expired or had wrong scope. The required scope on the trigger PAT is Agent Pools (Read); not Agent Pools (Read & manage), just Read. Fix is to mint a new PAT and update the secret; KEDA picks it up on the next poll without a restart.

Agent connect error: The HTTP request timed out after 00:01:00 from inside the agent container. Cause: a network policy blocking outbound to dev.azure.com on port 443, or a misconfigured httpsProxy env var inherited from the cluster. In our case, an old Calico GlobalNetworkPolicy predating the AKS upgrade was filtering egress on certain ports. Fix was a specific egress allow rule for the agents namespace.

The agent name X is already in use. Specify a different name with the --agent argument. Cause: name collision under high parallelism. Fix is in the gotcha section above. If you see this after applying that fix, check the downward API reference is not malformed and evaluating to literally metadata.name.

No agent found in pool aks-linux for job from inside an Azure DevOps pipeline. Cause: the pool exists but no agents have registered and KEDA is not running. Check kubectl logs -n keda deploy/keda-operator. I have most often seen the operator's service account missing workload identity binding annotations after a Helm upgrade reset them.

KEDA scaler returned error: connection refused against the Azure DevOps API. Cause: a temporary dev.azure.com outage, which happens roughly once a quarter. The scaler retries on its own poll interval. We page on "queue depth has been non-zero for more than 5 minutes without a corresponding pod creation," not on this directly.

The cost graph

The monthly Azure Pipelines line on the invoice for the migration period:

April 2024: £4,180. Microsoft-hosted only. May 2024: £4,210. Same as before, no migration started. June 2024: £3,840. First three pipelines moved. AKS cost negligible because the cluster already existed for other workloads. July 2024: £2,710. Two-thirds of pipelines moved. Hosted lane count reduced from 10 to 4. August 2024: £1,860. All migratable pipelines moved. Lane count down to 2 (for the pipelines that stayed hosted, intentionally, because they ran on Windows agents we did not want to self-host). September 2024: £1,340. Lane count to 1. AKS line item visible at £170 for the agents pool specifically. October 2024 to date: £1,210 on average, with the AKS line at £210 (some growth in node-hours as more teams adopted) and the hosted lane at £1,000 for one paid lane plus the free grant.

The £1,000 for one paid lane is what keeps the Windows-only and macOS pipelines running. We did not self-host Windows because the operational tax on a self-hosted Windows agent is genuinely higher: image bake takes longer, the agent's caching behaviour is different, and the spot pricing on Windows VMs is less aggressive. We will probably do it next year. macOS we will not self-host because we are not prepared to operate a fleet of Mac minis or pay the licensing for hosted Mac VMs at our volume.

The £4,180 to £1,210 is the headline. The £1,210 includes our share of AKS, which would exist anyway for the platform's other workloads, so the marginal cost of the agents pool is closer to £210 a month against the hosted-only baseline of £4,180. That is the number that matters for the next round of "should we self-host more" conversations.

Where this stops being worth it

Self-hosting only makes sense if you have three things at once: an AKS cluster already running for other reasons (so the marginal infrastructure cost is small), a build volume that justifies the operational attention (we run roughly 600 builds a day; below 100 a day I would not bother), and a team willing to own the agent image and the rotation and the spot-eviction tuning. Any of those missing and the migration is a net loss against the simplicity of Microsoft-hosted.

The decision rule we use, for what it is worth: if your monthly hosted bill is under £1,500 and your queue p95 is under 90 seconds, stay hosted. The number of weekends I would not get back fighting agent image regressions is worth more than that delta. If your bill is climbing, your queue times are visibly bothering developers, and you already operate Kubernetes seriously, KEDA-based self-hosting is the right move and the operational tax is roughly two engineer-days a month after the initial migration settles. Less than I expected. Worth doing for us. Not worth doing for the team next door who run nine pipelines on hosted agents and are perfectly content.

The interesting second-order effect is the cultural one. Self-hosted agents made our pipeline owners think about what they were doing in their pipelines. The previous "throw a apt-get install step in there and forget about it" pattern stopped, because every install was now visible as a delay we owned. Two teams refactored their pipelines from twenty steps down to four, baked the toolchain into the image, and dropped their build times by half. That was not the goal of the migration. It was a side effect of putting the cost back where the work happens, which I now suspect is the actual value of self-hosting infrastructure. The cost graph is the headline; the change in how people think about their builds is the part that compounds.