Scaling with Application Gateway Ingress Controller

How Application Gateway Ingress Controller (AGIC) works is depicted in the following diagram on its document site.

AGIC Architecture

Rather than pointing the backend pool of App Gateway to a Kuberntes service, AGIC updates it with pods’ IP addresses. The gateway load balance the traffic to pods directly. In this way, it simplifies the network configuration between the app gateway and the AKS cluster.

When the workload needs to scale out to handle the increasing user load, there are two parts that need to be considered, the scaling of the app gateway and the scaling of pods.

Scaling for Application Gateway

Application Gateway supports autoscaling. If you don’t change its default settings, it scales from 0 to 10 instances. However, setting the minimum instance to 0 is not a good idea for production environment. As it is mentioned in the high traffic support document, autoscaling takes 6 to 7 minutes to provision and scale out to additional instances. If the number of minimum instances is too small, app gateway may not be able to handle the spike of the traffic. You may see HTTP 504 error in this case.

The rational number of minimum instances should be based on Current Compute Unit metric. An app gateway instance can handle 10 compute units. You should monitor this metric to decide how many instances you need for the minimum instances.

Scaling for Pods

Kubernetes handles the autoscaling of pods if you use HPA for it. However, when using AGIC, you could probably see HTTP 502 error when pods scale down. Actually, the HTTP 502 error could happen in the following 3 situations when AGIC is in place:

  • You scale down the pods either manually or via HPA.
  • You are doing rolling update to workload.
  • Kubernetes evicts pods.

The issue is because the app gateway backend pool cannot be updated fast enough to match the changes on AKS side. This document has more details about this issue. It also discussed some workarounds, but the issue cannot be 100% bypassed. You should be aware of the potential HTTP 502 error when you are in one of the above situations.


Now we know the issues that we may face when the workload scales. Here are several recommendations which may help to minimize the chances of errors when you expect to handle increasing user loads.

  • Set proper values for the minimum and maximum instances of app gateway. Give 20% to 30% buffer to the minimum instances.
  • For critical workloads, pre-scale the pods and temporarily disable HPA to avoid unexpected scaled down before the peak load. Enable HPA or scale down pods when peak load is off.
  • Ensure the AKS cluster has enough resources, and the critical pods have the proper QoS so the pods won’t be evicted unexpectedly.
  • Plan the proper time to do rolling update.