Progressive Delivery on AKS: A Step-by-Step Guide using Flagger with Istio and FluxCD
In my previous post, we setup an Azure Kubernetes Service (AKS) cluster to automatically update images based on new image tags in a container registry. As soon as a new image was pushed to the registry the image was immediately updated.
But what if you don’t want an agent automatically pushing out new images without some sort of testing? 🤔
In this article, we’ll build upon Flux’s image update automation capability and add Flagger to implement a canary release strategy.
Flagger is a progressive delivery tool that enables a Kubernetes operator to automate the promotion or rollback of deployments based on metrics analysis. It supports a variety of metrics including Prometheus, Datadog, and New Relic to name a few. It also works well with Istio service mesh, and can implement progressive traffic splitting between primary and canary releases.
The goal here is to harness the power of image update automation while implementing some sort of gating process around it.
Here is the intended workflow:
- Modify application code, then commit and push the change to the repo.
- Create a new release in GitHub which kicks off a release workflow to build and push an updated container image to a GitHub Container Registry.
- FluxCD detects the new image and updates the image tag in a YAML manifest.
- FluxCD rolls out the new image to the cluster.
- Flagger detects a new deployment revision and starts a canary deployment.
- Flagger progressively routes traffic to the new deployment based on metrics.
- Flagger promotes the new deployment to production if the metrics are within the threshold.
We’ll move really fast through the AKS cluster provisioning, bootstrapping process, and deploying the AKS Store Demo sample app.
If you want a closer look at how image update automation is configured using FluxCD, check out my previous post.
Let’s go!
Prerequisites
Before you begin, you need to have the following:
Create an AKS cluster and bootstrap FluxCD
Run the following command to log into Azure and make sure you have the AzureServiceMeshPreview
feature enabled.
az login
az feature register --namespace "Microsoft.ContainerService" --name "AzureServiceMeshPreview"
Next run the following command to setup some variables for your deployment.
RG_NAME=rg-flagger
AKS_NAME=aks-flagger
LOC_NAME=westus2
We’ll deploy an AKS cluster with the Istio service mesh add-on enabled. If you are unfamiliar with service mesh in general, check out the Istio documentation and my previous post on Service Mesh Considerations for more information.
If you still have the AKS cluster from the previous post, you might want to delete it and start fresh.
Run the following commands to create the resource group and AKS cluster with the Istio add-on enabled.
az group create -n $RG_NAME -l $LOC_NAME
az aks create -n $AKS_NAME -g $RG_NAME --enable-azure-service-mesh --generate-ssh-keys -s Standard_B4s_v2
Istio offers internal and external ingress capabilities which controls traffic coming into the cluster. We’ll use the external ingress as our entry point for the sample app.
Run the following command to enable the external ingress gateway.
az aks mesh enable-ingress-gateway \
-n $AKS_NAME \
-g $RG_NAME \
--ingress-gateway-type external
After the cluster and Istio external ingress gateway are deployed, run the following command to connect to the cluster.
az aks get-credentials -n $AKS_NAME -g $RG_NAME
Let’s move on to bootstrapping FluxCD and deploying the AKS Store Demo app.
Bootstrap cluster using FluxCD
We’ll use the GitHub CLI to work with GitHub and Flux CLI generate new Flux manifests so be sure you have these tools installed.
Connect to GitHub using the GitHub CLI
Run the following command to log into GitHub.
gh auth login --scopes repo,workflow
Fork and clone the AKS Store Demo repo
If you’re continuing from the previous post, you should already have the AKS Store Demo repo forked and cloned. If not, run the following commands to fork and clone the repo.
gh repo fork https://github.com/azure-samples/aks-store-demo.git --clone
cd aks-store-demo
gh repo set-default
Create a release workflow
If you’re continuing from the previous post, you should already have a release workflow in the AKS Store Demo repo. If not, make sure you are in the root of the aks-store-demo
repository and run the following commands to create a release workflow.
# download the releae workflow
wget -O .github/workflows/release-store-front.yaml https://raw.githubusercontent.com/pauldotyu/aks-store-demo/main/.github/workflows/release-store-front.yaml
# download the TopNav.vue file which we'll be modifying
wget -O src/store-front/src/components/TopNav.vue https://raw.githubusercontent.com/pauldotyu/aks-store-demo/main/src/store-front/src/components/TopNav.vue
# commit and push
git add -A
git commit -m "feat: add release workflow"
git push
# back out to the previous directory
cd -
Fork and clone the AKS Store Demo Manifests repo
The great thing about Flux is that it can be bootstrapped using GitOps. So we’ll point to a branch in my AKS Store Demo Manifests repo which has everything we need to get the cluster setup quickly.
If you’re continuing from the previous post, you should already have the AKS Store Demo Manifests repo forked and cloned. If not, run the following commands to fork and clone it.
gh repo fork https://github.com/pauldotyu/aks-store-demo-manifests.git --clone
cd aks-store-demo-manifests
gh repo set-default
I have updated manifests to include Istio resources in the istio
branch. We’ll use this branch to the cluster.
git fetch
git checkout --track origin/istio
# get the latest from upstream
git fetch upstream istio
git rebase upstream/istio
Secrets for FluxCD Image Update Automation
As mentioned in my previous post, we’ll need to create a Flux secret to allow Flux to write to our GitHub repo.
Run the following command to create a namespace to land the Kubernetes secret into.
kubectl create namespace flux-system
Run the following commands set your GitHub info.
# make sure you are in the aks-store-demo-manifests repo
export GITHUB_USER=$(gh api user --jq .login)
export GITHUB_TOKEN=$(gh auth token)
export GITHUB_REPO_URL=$(gh repo view --json url | jq .url -r)
Run the following command to create the secret.
flux create secret git aks-store-demo \
--url=$GITHUB_REPO_URL \
--username=$GITHUB_USER \
--password=$GITHUB_TOKEN
Update the GitRepository
URL in a couple of Flux manifests to point to your repo.
sed "s/pauldotyu/${GITHUB_USER}/g" clusters/dev/flux-system/gotk-sync.yaml > tmp && mv tmp clusters/dev/flux-system/gotk-sync.yaml
sed "s/pauldotyu/${GITHUB_USER}/g" clusters/dev/aks-store-demo-source.yaml > tmp && mv tmp clusters/dev/aks-store-demo-source.yaml
sed "s/pauldotyu/${GITHUB_USER}/g" clusters/dev/aks-store-demo-store-front-image.yaml > tmp && mv tmp clusters/dev/aks-store-demo-store-front-image.yaml
git add -A
git commit -m 'feat: update git sync url'
git push
We can now bootstrap our cluster with FluxCD using the manifests in the istio
branch of the AKS Store Demo Manifests repo.
flux bootstrap github create \
--owner=$GITHUB_USER \
--repository=aks-store-demo-manifests \
--personal \
--path=./clusters/dev \
--branch=istio \
--reconcile \
--network-policy \
--components-extra=image-reflector-controller,image-automation-controller
After a minute or two, run the following command to watch the bootstrap process.
flux logs --kind=Kustomization --name=aks-store-demo -f
# press ctrl-c to exit
Once the Kustomization reconciliation process is complete, run the following command to retrieve the public IP address of the Istio ingress gateway.
echo "http://$(kubectl get svc -n aks-istio-ingress aks-istio-ingressgateway-external -o jsonpath='{.status.loadBalancer.ingress[0].ip}')"
You should see the AKS Store Demo app running in your browser.
Install Flagger
Time to install Flagger, the GitOps way!
We’ll use the Flux CLI to generate the Flagger manifests and commit them to our repo.
Run the following command to generate a HelmRepository
for Flagger’s Helm chart.
flux create source helm flagger \
--url=oci://ghcr.io/fluxcd/charts \
--export > ./clusters/dev/flagger-source.yaml
Run the following command to create a values.yaml
file which will be used to configure Flagger.
cat <<EOF > values.yaml
meshProvider: istio
prometheus:
install: true
EOF
Here, we are telling Flagger to use Istio as the service mesh provider and to install Prometheus to collect metrics.
Next we need to create a HelmRelease
resource to install Flagger and pass in the values.yaml
file we just created to configure Flagger.
flux create helmrelease flagger \
--target-namespace=flagger-system \
--create-target-namespace=true \
--crds CreateReplace \
--source=HelmRepository/flagger \
--chart=flagger \
--values=values.yaml \
--export > ./clusters/dev/flagger-helmrelease.yaml
You don’t need the values.yaml
file anymore, so run the following command to delete it.
rm values.yaml
Flagger can also run load tests against your application to generate metrics. We’ll use its load testing service to generate load against our application.
Flagger’s load testing service can be installed via a Kustomization
resource based on manifests packaged as an artifact in an Open Container Initiative (OCI) registry
Run the following command to create an OCIRepository
pointing to an OCI registry.
flux create source oci flagger-loadtester \
--url=oci://ghcr.io/fluxcd/flagger-manifests \
--tag-semver=1.x \
--export > ./clusters/dev/flagger-loadtester-source.yaml
Run the following command to create a Kustomization
resource for the installation manifests.
flux create kustomization flagger-loadtester \
--target-namespace=dev \
--prune=true \
--interval=6h \
--wait=true \
--timeout=5m \
--path=./tester \
--source=OCIRepository/flagger-loadtester \
--export > ./clusters/dev/flagger-loadtester-kustomization.yaml
We’re ready to commit our changes to our repo.
# pull the latest changes from the repo
git pull
# add the new files and commit the changes
git add -A
git commit -m 'feat: add flagger'
git push
This will trigger a FluxCD reconciliation and install Flagger in our cluster.
After a minute or two, run any of the following commands to see the status of the new resources.
flux get source helm
flux get source chart
flux get source oci
flux get helmrelease
flux get kustomization
Confirm that Flagger is installed and running.
kubectl get deploy -n flagger-system
Deploy a Canary
With Flagger installed, a Canary Custom Resource Definition (CRD) is available to us.
A Canary resource will automate some of our Kubernetes resources. It will create a Service, VirtualService, and a Canary deployment for us. So we don’t need to create these resources ourselves.
Open the ./base/store-front.yaml
manifest file using your favorite editor and remove the Service
resource.
The Service
resource looks like this.
---
apiVersion: v1
kind: Service
metadata:
name: store-front
spec:
type: ClusterIP
ports:
- name: http
port: 80
targetPort: 8080
selector:
app: store-front
Next remove the VirtualService
resource.
The VirtualService
resource looks like this.
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: store-front
spec:
hosts:
- "*"
gateways:
- store-front
http:
- route:
- destination:
host: store-front
port:
number: 80
Finally, add the following Canary
resource to the end of the manifest file.
---
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: store-front
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: store-front
progressDeadlineSeconds: 60
service:
port: 80
targetPort: 8080 # make sure this matches container port
portDiscovery: true
hosts:
- "*"
gateways:
- store-front
analysis:
interval: 1m
threshold: 10
maxWeight: 20
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
- name: request-duration
thresholdRange:
max: 500
interval: 30s
- name: error-rate
thresholdRange:
max: 10
interval: 30s
webhooks:
- name: acceptance-test
type: pre-rollout
url: http://flagger-loadtester.dev/
timeout: 60s
metadata:
type: bash
cmd: "curl -s http://store-front-canary.dev"
- name: load-test
url: http://flagger-loadtester.dev/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://store-front-canary.dev"
There’s a lot going on here, but essentially we are telling Flagger to create a Canary deployment for our store-front
app. We are also telling Flagger to use the Istio ingress gateway to route traffic to our app and to use the load testing service to generate load against our app to generate metrics and analyze them to determine if the canary deployment should be promoted to production.
Note the
threshold
,maxWeight
, andstepWeight
values in the Canary manifest. We are going to start with 10% traffic directed to the canary and increase by 10%. Once the test is successful at 20% traffic, the Canary will promote the canary to primary and receive 100% of traffic. 20% is a low number here an intentionally set to speed up the testing process. Normally you would set this number closer to 100.
Commit and push the changes to your repo.
git add ./base/store-front.yaml
git commit -m 'feat: add store-front canary'
git push
Get Flux to reconcile the changes.
flux reconcile kustomization aks-store-demo --with-source
After Flux has finished reconciling, wait a minute or two then run the following command to watch the canary deployment and wait until the status is Initialized
.
kubectl get canary -n dev store-front -w
# press ctrl-c to exit
You can view Flagger logs with the following command:
kubectl logs -n flagger-system deployment/flagger-system-flagger
Run the following command to view important resources.
kubectl get service,destinationrule,virtualservice -n dev
Flagger has created two Service
resources (one for canary and one for primary), two DestinationRule
resources (one to route traffic to canary and one to route to primary) and one VirtualService
resource that can be configured for traffic shifting between the two services. Right now, the primary service is weighted to receive 100% of the traffic and you can confirm this by running the following command.
kubectl get virtualservice -n dev store-front -o yaml
With all this in place, ensure you can still access the AKS Store Demo app in your browser.
Test the Canary
Now we can test the progressive deployment of our canary.
Flip back to the
aks-store-demo
repo so that we can make another change to theTopNav.vue
file.
Run the following commands that will update the version number in the TopNav.vue
file from 1.0.0
to 2.0.0
.
# set the version number
PREVIOUS_VERSION=1.0.0
CURRENT_VERSION=2.0.0
# make sure you are in the aks-store-demo directory
sed "s/Azure Pet Supplies v${PREVIOUS_VERSION}/Azure Pet Supplies v${CURRENT_VERSION}/g" src/store-front/src/components/TopNav.vue > TempTopNav.vue
mv TempTopNav.vue src/store-front/src/components/TopNav.vue
Commit and push the changes to your repo.
git add -A
git commit -m "feat: update title again"
git push
Create a new release in GitHub and watch the magic happen!
gh release create $CURRENT_VERSION --generate-notes
Wait a few seconds then run the following command to watch the release build.
gh run watch
With the new image built, the Flux ImagePolicy
resource will reconcile, detect a new image tag and trigger the ImageUpdateAutomation
resource reconciliation process.
The new image tag will be written to the kustomization.yaml
manifest and the sample app’s Kustomization
resource will reconcile and update the its Deployment
.
Here is where Flagger picks up the baton. Flagger will detect a new deployment revision and trigger a canary deployment. It will progressively route traffic to the new deployment based on the metrics we defined in the Canary
resource. If the metrics are within the threshold, Flagger will promote the new deployment as the primary.
You can run the following commands to watch the image update process.
# watch image policy
flux logs --kind=ImagePolicy --name=store-front -f
# watch kustomization
flux logs --kind=Kustomization --name=aks-store-demo -f
# confirm the image tag was updated
kubectl get deploy -n dev store-front -o yaml | grep image:
You can then run the following command to watch the canary deployment.
kubectl logs -n flagger-system deployment/flagger-system-flagger -f | jq .msg
By the end of the Canary deployment process, you should see the following messages.
"New revision detected! Scaling up store-front.dev"
"Starting canary analysis for store-front.dev"
"Pre-rollout check acceptance-test passed"
"Advance store-front.dev canary weight 10"
"Advance store-front.dev canary weight 20"
"Copying store-front.dev template spec to store-front-primary.dev"
"Routing all traffic to primary"
"Promotion completed! Scaling down store-front.dev"
Now you can refresh the AKS Store Demo app in your browser and see the new version of the app 🥳
Conclusion
Image update automation is cool, but it’s even cooler when you can implement some sort of gating process around it. Flagger is a great tool to help you implement progressive delivery strategies in your Kubernetes cluster. It works well with Istio and can be configured to use a variety of metrics providers and if it detects an issue, it will rollback the deployment. It’s a great tool to have in your GitOps tool belt and will help you automate your deployments with confidence.
We’ve covered a lot of ground when it comes to GitOps and AKS but we have only scratched the surface. So stay tuned for more GitOps goodness!
Continue the conversation
If you have any feedback or suggestions, please feel free to reach out to me on Twitter or LinkedIn.
You can also find me in the Microsoft Open Source Discord, so feel free to DM me or drop a note in the cloud-native channel where my team hangs out!
Peace ✌️