Streamline Network Observability on AKS: A Step-by-Step Guide to enable the AKS add-on with Terraform

2023-07-10 11 min read Tutorial

Have you ever had to troubleshoot network issues in your Kubernetes clusters? If so, you know how challenging it can be to identify and resolve problems.

To troubleshoot network issues you probably had to use a combination of tools like kubectl, tcpdump, wireshark, and netstat. The list goes on and on… While these tools are great for debugging and capturing network logs and traces, they don’t provide a holistic view of your cluster’s network traffic.

The good news is that there’s a better way!

A few weeks ago, the Network Observability add-on for AKS was announced. This add-on is currently in preview and provides a simple way to enable network observability for your AKS clusters. The add-on is an eBPF-based solution that scrapes metrics from Kubernetes workloads and exposes them in Prometheus format. This allows you to use tools like Grafana to visualize your cluster’s network traffic. This can be either Bring-Your-Own Prometheus and Grafana or Azure-managed Prometheus and Grafana.

The AKS docs include a step-by-step guide for enabling the add-on using the Azure CLI.

In this blog post, I’ll walk you through the steps on how you can enable the AKS add-on using Terraform.

Before you begin

You should have an Azure subscription and the Azure CLI installed. You’ll also need to install the Terraform CLI.

If you have all of the above, you’re ready to get started!

Run the following command to log in to your Azure account using the Azure CLI:

az login

With the network observability add-on being in preview, you’ll need to register the NetworkObservabilityPreview feature by running the following command:

az feature register \
  --namespace "Microsoft.ContainerService" \
  --name "NetworkObservabilityPreview"

NOTE: This command can take a few minutes to complete. You can check the status of the feature registration using the following command:
az feature show \
  --namespace "Microsoft.ContainerService" \
  --name "NetworkObservabilityPreview"

You can proceed when feature has been registered.

Overview of what we’ll be doing

If you’ve used the Azure CLI command to enable the network observability add-on in your AKS cluster, you’ll find that all it takes is a single flag (–enable-network-observability) to enable the feature and a few commands to wire up the AKS cluster to the Azure managed Prometheus and Grafana instances. I want to use Terraform to provision the add-on. It’s a bit more involved but worth knowing how it’s all wired up.

The process of enabling the network observability add-on using Terraform can be broken down into the following steps:

Create an AKS cluster
Create an Azure Monitor workspace with data collection rules, endpoints, and alerts for Prometheus
Enable the network monitoring add-on for the AKS cluster
Create an Azure Managed Grafana instance with proper role-based access control (RBAC) assignments so that you can log into Grafana and for Grafana to access the Azure Monitor workspace
Import the Kubernetes / Networking dashboard into our Grafana instance

After following the steps above, we’ll deploy a sample application to the AKS cluster and explore the network observability dashboard.

NOTE: If you’re really curious to know what the --enable-network-observability flag does in Azure CLI, you can read through the source code here

Setting up Terraform providers

All my Terraform code can be found here. You can use this as a reference to follow along with the steps below.

Create a new Terraform configuration file named main.tf and add the following code:

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "=3.62.1"
    }

    local = {
      source  = "hashicorp/local"
      version = "=2.4.0"
    }

    helm = {
      source  = "hashicorp/helm"
      version = "=2.10.1"
    }

    azapi = {
      source  = "Azure/azapi"
      version = "=1.7.0"
    }
  }
}

provider "azurerm" {
  features {
    resource_group {
      prevent_deletion_if_contains_resources = false
    }
  }
}

provider "helm" {
  kubernetes {
    config_path = local_file.example.filename
  }
}

locals {
  name     = "neto11y${random_integer.example.result}"
  location = "eastus"
}

data "azurerm_client_config" "current" {}

Here we are defining the required Terraform providers and the Azure provider configuration. We are also defining a few local variables that will be used throughout the Terraform configuration.

Notice that we’re using the azapi and helm providers in addition to the azurerm provider. The azapi provider is used to update our AKS cluster and enable the network observability add-on. With this AKS add-on being in preview, it is not yet available in azurerm, so this is a great opportunity to utilize the azapi provider to update the AKS resource.

The helm provider is used to deploy a sample application to our AKS cluster. We’ll get to that later.