Paul's Blog

A collection of notes and stuff I find interesting

Deploying ARM64 workloads to AKS

2022-11-02 6 min read Azure Kubernetes ARM64 Docker

You might have heard by now that Azure has partnered with Ampere to bring ARM-based processors for virtual machines on Azure. This is super exciting as it gives you an opportunity to deploy workloads on highly performant and power efficient virtual machines and these characteristics ultimately result in excellent price-performance (lower costs 🥳)

So… are you ready to deploy your workloads to ARM64 node pools on AKS? I sure wasn’t when attempting to deploy the azure-voting-app-redis application to my cluster.

In this article, I will cover some things you’ll need to consider before issuing that kubectl apply command against your AKS cluster to deploy your ARM64 workload.

Adding an ARM64 node pool to an AKS cluster

Adding an ARM64 node pool to an existing AKS cluster is straight-forward; you simply add a new node pool. The quickest way to do this is by issuing the following Azure CLI command:

$ az aks nodepool add \
    --resource-group <RESOURCE_GROUP_NAME> \
    --cluster-name <AKS_CLUSTER_NAME> \
    --name <ARM64_NODEPOOL_NAME> \
    --node-count <ARM64_NODEPOOL_COUNT> \
    --node-vm-size <ARM64_NODEPOOL_VM_SIZE>

The command above is no different than adding any other node pool. The VM SKU you select for the node pool is what matters.

Selecting a SKU

The Azure ARM-based virtual machine families include:

📝 NOTE

If you intend to use virtual machines with ephemeral OS disks for your node pool, you will need to select a VM SKU that offers a temp storage disk. This is usually indicated by a lowercase d in the SKU (example: Standard_D2pds_v5).

💡 TIP

Before you attempt to deploy your node pool, it is a good idea to run the following Azure CLI command to check if the SKU is available in your region:

$ az vm list-sizes \
    --location eastus \
    --query "[? contains(name, 'Standard_D2pds_v5')]" \
    --output table

Tainting the system node pool

After adding a new user node pool to your AKS cluster, you should taint your system node pool with CriticalAddonsOnly=true:NoSchedule so that it will only run system pods and prevent application pods from scheduled on your system nodes.

You can taint existing node pools using an Azure CLI command like this:

$ az aks nodepool update \
    --resource-group <RESOURCE_GROUP_NAME> \
    --cluster-name <AKS_CLUSTER_NAME> \
    --name <SYSTEM_NODEPOOL_NAME> \
    --node-taints "CriticalAddonsOnly=true:NoSchedule"

Multi-platform container images

Next thing to consider is whether or not your container images support the ARM64 architecture.

As stated above, my attempt to deploy the azure-voting-app (which is used in many AKS quickstart guides) failed as it is built to only support AMD64 architecture.

This led me down a path of digging through image layers via Docker Hub and Dockerfile manifests on GitHub to find out where AMD64 can be pulled in.

Dockerfile investigation

You can use the Docker CLI to inspect a container image manifest. I took a look at the container image (mcr.microsoft.com/azuredocs/azure-vote-front:v1) used in the quick start guide and confirmed it does not support ARM64 architecture.

Example:

$ docker manifest inspect mcr.microsoft.com/azuredocs/azure-vote-front:v1

Possible solutions

This meant, the container image needs to be rebuilt to support multiple architectures.

There are a few options of doing this:

  1. Create a Dockerfile for each platform, build each version, then create a manifest to represent the container image as a single image with multiple variants (we’ll see what that looks like below)
  2. Create a single Dockerfile and use Docker Buildx to build a container images for each platform I want to support by using the --platform flag and specifying the target platforms linux/amd64 and linux/arm64.

A single Dockerfile here may not always work if you are building an application that require special syntax for cross-compilation.

I opted for the Docker Buildx option as I was not excited about having to maintain multiple Dockerfiles. The azure-vote-app is a Python/Flask app so I could get away with a single Dockerfile.

But… Docker Buildx will not magically make everything work.

You will still need to inspect the Dockerfile and ensure all its base layers also support ARM64 architecture. This is where you will need to do some investigation to see how the container is put together.

Here’s how the mcr.microsoft.com/azuredocs/azure-vote-front:v1 container image is made up

So what does this mean? Well, it means I couldn’t use the original Dockerfile and needed to build one from scratch based on python:3.6-buster 😓

After a bit of code borrowing, merging and bumping the Python version to 3.9, a new multi-platform compatible manifest was created and the Dockerfile can be found here.

Big shout out to @tiangolo who’s container images have been powering the AKS guickstart guide for all these years! 🎉

Building and publishing a multi-platform image with docker buildx

With a new Dockerfile, I used the docker buildx build command to build and push to my Azure Container Registry.

# create a new builder
$ docker buildx create --name mybuilder --driver docker-container --bootstrap --use

# log into azure container registry
$ az acr login --name cloudnativeadvocates

# build and push to azure container registry
$ docker buildx build --platform linux/amd64,linux/arm64 --tag cloudnativeadvocates.azurecr.io/azure-vote-front:v1 --push .

Now if you inspect the manifest for the cloudnativeadvocates.azurecr.io/azure-vote-front:v1.0.0 image, you will see it supports multiple platforms.

$ docker manifest inspect cloudnativeadvocates.azurecr.io/azure-vote-front:v1.0.0
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 6579,
         "digest": "sha256:0c13642cc335967d8a382ce39ce5b20338cefc47a404ac373a0f73ad13d1260a",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 6579,
         "digest": "sha256:682111c0a5f11e35355e8f30bd2134e03646aa3a8403baf200bf34481e8b1fed",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      }
   ]
}

Automate the build

Building locally to test is good, but I never like running the same commands over and over again. So I added a new GitHub Action workflow to automate the build and push process each time a change is added to my forked repo. The workflow definition can be found here.

Updating Kubernetes deployment manifests

One final thing to consider is to use a nodeSelector in the deployment pod spec. Polyglot architectures means we’ll have components written in a variety of languages and compiled for multiple operating systems and architectures. So, setting a value of kubernetes.io/arch: "arm64" is a good idea to ensure your ARM64 workloads will not attempt to be scheduled on non-ARM64 nodes.

Summary

I hope this helps you prepare to take advantage of the highly performant and cost efficient ARM-based node pools in AKS. As noted, the deployment of ARM64-based node pools is easy. What may not be easy is ensuring your container image and all its base layers will support multiple platforms. So you may need rewrite your Dockerfile in addition to modifying your container build process. From there, you should add a taint to the system node pool and a nodeSelector to your pod spec to ensure the application is scheduled to appropriate nodes.

If you have any feedback or questions please reach out in the comments below or via Twitter @pauldotyu 😊

Now you give it a try

Try deploying the ARM64 version of the azure-voting-app to AKS by heading over to the AKS with ARM64 node pools lab. This lab will have you deploy infrastructure using Azure Bicep, but if you are more comfortable using Terraform, then use this Terraform deployment as an alternative path.

Til next time, cheers!

Resources