The Essentials of Resource Management in Kubernetes

The resource requests and limits that we set for containers in a Pod spec are the key settings that we can use to influence how Kubernetes schedule the pod and manage the computational resources, typically CPU and memory, of nodes. Understanding how resource requests and limits work is essential to understand the resource management in Kubernetes.

Resource requests and limits

The Kubernetes scheduler uses the resource requests as one of factors to decide on which node the pod can be scheduled. Rather than looking at the actual resource usage on each node, the scheduler uses the node allocatable and the sum of the resource requests of all pods running on the node to make the decision. If you don’t set the resource requests, your pod may be scheduled on any node which still has the unallocated resource. But on the other side, your pod may not get enough resource to run or may even be terminated if the node is under resource pressure. Setting resource requests ensures containers get the minimum amount of resources that they need. It also helps the kubelet determine the eviction order when necessary.

With the resource limits, you set the hard limits of the resources a container can use. Hard limits mean the container cannot use more resource than its limits. If it attempts to do it, there will be consequences. If it attempts to use more CPU which is compressible, its CPU time will be throttled; If it attempts to use more memory which is incompressible, it will be terminated with an OOMKilled error. If you don’t set resource limits, the container could use all available resources on the node. But on the other side, it could become a noisy neighbor and could be terminated when the node is under resource pressure. Setting resource limits ensures the maximum amount of resources a container can use.

If you specify the resource limits for a container, but don’t specify the resource requests, Kubernetes automatically assigns the requests that matches the limits. The different combination of these two settings also defines the QoS class of the pod.

Since the scheduler only uses the resource requests when scheduling pods, a node could be overcommitted, which means the sum of the resource limits of all pods on the node could be more than the node allocatable of the node. The node could be under the resource pressure. If that happens, especially if the node is under memory pressure, the pods running on it could be evicted.

Eviction of pods

From the resource management perspective, there are 2 situations where pods could be evicted:

  1. a pod attempts to use more memory than its limit.
  2. a node is under resource pressure.

Pods could also be evicted because of other reasons, such as pod priority and preemption etc. I won’t discuss them in this post. When a pod is evicted, if it can be restarted, Kubernetes will restart it.

When a pod with resource limits is scheduled on a node, the kubelet passes the resource limits to the container runtime as the CPU/memory constraints. The container runtime sets these constraints on the cgroup of the container. When the memory usage of the container is over its limit, the OOM killer of the Linux kernel kicks in and kills it. You will see OOMKilled error in the status of the pod. The kernel takes care of the resource usage of cgroups. Whether the node is under resource pressure or not doesn’t matter.

On the other hand, the kubelet monitors the resource usage of the node. When the resource usage of the node reaches certain level, it marks the node’s condition, tries to reclaim node level resources, and eventually evicts pods running on the node to reclaim resources. When the kubelet has to evict pods, it uses the following order to select which pod should be evicted first:

  1. Whether the pod’s resource usage exceeds its requests
  2. Pod priority
  3. The pod’s resource usage relative to its requests

The kubelet doesn’t use the pod’s QoS class directly to determine the eviction order. The QoS class is more like a tool to help us, humans, estimate the potential pod eviction order. The key factor here is the resource requests of the pod. From the above list we know that, apart from the pod priority:

  • BestEffort pods would be evicted first as its resource usage always exceeds its requests and its usage relative to requests is huge, since there are no requests defined at all.
  • Burstable pods could be evicted secondly if its resource usage exceeds its requests.
  • Guaranteed pods and Burstable pods of which the usage doesn’t exceed its requests are the last in the eviction order.

Although QoS class doesn’t affect how the kubelet determines the pod eviction order, it affects the oom_score that the OOM killer of the Linux kernel uses to determine the order of containers it kills in case if the node is out of memory. The oom_score_adj value of each QoS class is in the table below.

QoS Classoom_score_adj
Guaranteed-997
BestEffort1000
Burstable2 – 999

Takeaways

Now we know how resource requests and limits works in Kubernetes. Here are some best practices you can use when defining pods.

  • All pods should have resources requests and limits specified. You can leverage Kubernetes features such as resource quota and limit ranges to enforce it on namespaces. If you are on AKS, you can also use Azure Policy to enforce it.
  • For critical pods where you want to minimize its chances of being evicted, make sure its QoS class is Guaranteed.
  • To reduce the side effect of a user pod to the system pods, separate system pods and user pods on different nodes/node pools. If you are on AKS, create system and user node pools in the cluster.
  • If computational resources of your Kubernetes cluster are not a constraint, enable HPA and cluster autoscaler for the workloads.
  • On a node of a Kubernetes cluster, you should not deploy any components/software outside of Kubernetes. If you have to install additional components/software, use a Kubernetes native way, such as via DaemonSet.

Reference

Scaling with Application Gateway Ingress Controller

How Application Gateway Ingress Controller (AGIC) works is depicted in the following diagram on its document site.

AGIC Architecture

Rather than pointing the backend pool of App Gateway to a Kuberntes service, AGIC updates it with pods’ IP addresses. The gateway load balance the traffic to pods directly. In this way, it simplifies the network configuration between the app gateway and the AKS cluster.

When the workload needs to scale out to handle the increasing user load, there are two parts that need to be considered, the scaling of the app gateway and the scaling of pods.

Scaling for Application Gateway

Application Gateway supports autoscaling. If you don’t change its default settings, it scales from 0 to 10 instances. However, setting the minimum instance to 0 is not a good idea for production environment. As it is mentioned in the high traffic support document, autoscaling takes 6 to 7 minutes to provision and scale out to additional instances. If the number of minimum instances is too small, app gateway may not be able to handle the spike of the traffic. You may see HTTP 504 error in this case.

The rational number of minimum instances should be based on Current Compute Unit metric. An app gateway instance can handle 10 compute units. You should monitor this metric to decide how many instances you need for the minimum instances.

Scaling for Pods

Kubernetes handles the autoscaling of pods if you use HPA for it. However, when using AGIC, you could probably see HTTP 502 error when pods scale down. Actually, the HTTP 502 error could happen in the following 3 situations when AGIC is in place:

  • You scale down the pods either manually or via HPA.
  • You are doing rolling update to workload.
  • Kubernetes evicts pods.

The issue is because the app gateway backend pool cannot be updated fast enough to match the changes on AKS side. This document has more details about this issue. It also discussed some workarounds, but the issue cannot be 100% bypassed. You should be aware of the potential HTTP 502 error when you are in one of the above situations.

Recommendations

Now we know the issues that we may face when the workload scales. Here are several recommendations which may help to minimize the chances of errors when you expect to handle increasing user loads.

  • Set proper values for the minimum and maximum instances of app gateway. Give 20% to 30% buffer to the minimum instances.
  • For critical workloads, pre-scale the pods and temporarily disable HPA to avoid unexpected scaled down before the peak load. Enable HPA or scale down pods when peak load is off.
  • Ensure the AKS cluster has enough resources, and the critical pods have the proper QoS so the pods won’t be evicted unexpectedly.
  • Plan the proper time to do rolling update.

Enable Virtual Node on an Existing AKS Cluster

The virtual node can be enabled when you create a new AKS cluster. There are documents talking about how to do it with either the Azure CLI or Azure Portal. Since the virtual node is an AKS add-on, it can be enabled on existing AKS clusters as well, as long as the clusters are using Azure CNI as the network plug-in.

The following is the procedure of how to enable the virtual node for an existing AKS cluster.

1. In the VNET which the AKS cluster is in, create a new subnet. The virtual node is based on Azure Container Instance (ACI). In the scenario of deploying container groups to a VNET, the subnet will be delegated to ACI and therefore can only be used for container groups. So don’t use the subnets that are used by other node pools.

2. Run the following command to enable the virtual node add-on.

az aks enable-addons -n <cluster-name> -g <resource-group-name> \
-a virtual-node --subnet <subnet-name>

3. When the command completes successfully, the virtual node is enabled on the cluster. You should see the virtual node when you use kubectl get nodes. If you check the cluster status in Azure Portal, you should see the virtual node pools is enabled on the Overview page. In the AKS node resource group, a managed identity is created for the ACI connector. And the network profile is also created. You can view it with the command: az network profile list --resource-group <name of aks managed rg>.

In case you cannot see the virtual node after the add-on is enabled, a possible reason is the managed identity for the ACI connector doesn’t have the proper permission to the vnet. It could happen especially when the vnet is not in the node resource group. You can manually grant contributor permission of the vnet to the managed identity.

4. Deploy a pod to the virtual node by using nodeSelector and toleration such as this sample. Follow the steps in the same document to test if the pod works.

5. To remove the virtual node, follow the instructions in remove virtual nodes section of the document. The virtual node also needs to be removed with kubectl delete node virtual-node-aci-linux command. See a sample below.

# Disable the virtual node add-on
az aks disable-addons -a virtual-node -g <resource-group-name> -n <cluster-name>
# Delete the virtual node from the cluster nodes
kubectl delete node virtual-node-aci-linux
# Delete the network profile
MRG=$(az aks show --resource-group <resource-group-name> \
  --name <cluster-name> --query nodeResourceGroup --output tsv)
NPID=$(az network profile list --resource-group $MRG --query '[0].id' --output tsv)
az network profile delete --id $NPID -y

The disable-addon command will simply remove the virtual node add-on from the cluster. It doesn’t drain and delete the virtual node. If there are pods running on the virtual node, those pods would be ended up being in the error state, and the underlying ACI would not be removed as well. It’s better to remove all pods before disabling the add-on.

Automate End-To-End UI Testing for Blazor WebAssembly App using Playwright

When I was developing the Azure Virtual Network Capacity Planner, I had to run the UI testing manually every time when I made some changes. It was a bit troublesome and not very efficient. I’d like to automate all the end-to-end UI testing so that I don’t have to repeat them manually again. Meanwhile I also wanted to try Playwright which is an open source E2E testing tool freshly baked from Microsoft.

However, the Blazor document is very brief regarding to the E2E testing. It doesn’t mention a concrete approach that we can follow to do the E2E testing for Blazor projects. So in this post, I’ll talk about in detail how we can automate E2E UI testing for Blazor WebAssembly with Playwright, and hopefully it can help to narrow the gap.

The Host

Before we can automate the browser to do any tests, we need a web host running in the memory for the site that will be tested. For the Blazor Server project, this article from Gérald Barré talks about how you can host it and test it with Playwright very well. Actually, thanks to Gérald Barré for his excellent work, the main idea of this post is coming from his article as well.

For the Blazor WebAssembly app, we cannot host it directly in the memory because it doesn’t include the necessary server-side components that are needed for a host. The NuGet package Microsoft.AspNetCore.Components.WebAssembly.DevServer helps us debug and test the project locally. However, it is an exe rather than a dll. We cannot reference and use it in a testing project. But thanks to OSS, we can create our own host server based on the source code the DevServer. For example, the following snippet shows a version that I created. The Startup is a copy from DevServer.

public class DevServer
{
     public static IHost BuildWebHost(string[] args) =>
         Host.CreateDefaultBuilder(args)
         .ConfigureHostConfiguration(config =>
         {
             var inMemoryConfiguration = new Dictionary
             {
                 [WebHostDefaults.EnvironmentKey] = "Development",
                 ["Logging:LogLevel:Microsoft"] = "Warning",
                 ["Logging:LogLevel:Microsoft.Hosting.Lifetime"] = "Information",
             };
             config.AddInMemoryCollection(inMemoryConfiguration);
         })
         .ConfigureWebHostDefaults(webBuilder =>  
         {
             webBuilder.UseStaticWebAssets();
             webBuilder.UseStartup<Startup>();
         }).Build();
 }

We can then wire it up as a xUnit fixture and use it to host the Blazor WebAssembly app as a static website. See my code for more details. As the Blazor WebAssembly app has to be hosted as a static website, we need to publish it first and then provide its output folder as the content root in the tests.

Using Playwright

When we have the in-memory web host ready, we can use Playwright to automate the E2E UI testing. You can find all details about how to use PlaywrightSharp in Gérald Barré’s post. I won’t repeat it here.

One of the best parts of Playwright is it supports multiple languages. One of them is Python. With Playwright for Python, we can record the user interactions in the browser and generate the code that can be used in the test project accordingly. And it does not only generate code in Python, but in C# and JavaScript as well. Simply use the command: python -m playwright codegen --target csharp to generate the code in C# and then copy the code to the test project, we can create test cases quickly.

The following is a screencast of running a test case with Playwright headful in slow mo. For a completed test project, please find it in my repo.

Running the testing in the build pipeline

To integrate the E2E testing with the build pipeline, we can simply run dotnet test after the dotnet publish. As the output folder of dotnet publish on the build agent could be different from the one on the local machine, I made the content root configurable in the test project by adding a testsettings.json file. With it, I can run the tests from both the local machine and the build agent.

Azure Virtual Network Capacity Planner

When you implement a landing zone or deploy workloads on Azure, Virtual Network (VNet) is usually the very fundamental Azure resource that you need to plan and deploy first before other resources. Among all other important aspects that you need to plan for a VNet, there is a basic one: the address space of the VNet and its subnets. You need to ensure the VNet has the plenty of the address space for your workloads and the future growth. At the same time, you should also try not to waste IP addresses. When you integrate Azure resources with the VNet, different resources may have different scaling patterns and therefore require different address space for the subnet. So, it’s important to plan the subnets based on the requirements of the resources that you will deploy.

Quite often I got asked by customers about what kind of address space they need to plan for the VNet and subnets to run their workloads. To simplify this planning task, I created a tool, Azure Virtual Network Capacity Planner. With this tool, you create subnets for the Azure resources that you need to deploy. It helps you calculate the address space the subnets need and eventually the address space of the VNet. You can then export the result into the ARM template or CSV file for the actual deployment.

The following is a screenshot of the planner.

Azure Virtual Network Capacity Planner

The VNet planner is built with Blazor. You can find its source code in this repo. Feel free to raise issues or PR if you have any feedback or better idea.

User Groups in Azure API Management

In Azure API Management, there are 3 built-in groups: Administrators, Developers and Guests. These groups are meant for the developer portal to do the authorization for developer accounts. Based on which group a developer account is in, the developer portal controls what APIs the developer can see. The groups have nothing to do with the actual access control of the API endpoints in APIM.

According to this document, the built-in groups are immutable. Their membership is managed by APIM. You can neither add or remove users to them nor modify the groups themselves. The subscription administrators are the members of the Administrator group. It used to be possible to add a user account to the Administrators group by assigning the Api Management Service Contributor role to it. But it is not the case anymore. The users you add in the APIM are the members of Developers group. The unauthenticated users of the developer portal fall under the Guests group.

Besides the built-in groups, there is a built-in Administrator account which is immutable as well. You can neither delete it nor change its properties. Its email address is the one that you input as the Administrator email when you provision the APIM instance. There is no way for you to create or change other properties of this account, such as its first name, last name, or password etc. There is no UI for that. And if you tried to do it via the management API, you would get HTTP 405 Method Not Allow error. So be careful to choose the Administrator email when provisioning the APIM instance.

In the situation where you really have to make changes to the built-in Administrator account, try to contact Azure Support then.

Update:

It appears that the Administrator’s email can be changed through Notification templates > Email settings. However, using this option would cause a short downtime to APIM (Service is being updated… for several minutes). Be careful.

Using SQL Always Encrypted with Entity Framework

I created some sample code to demonstrate how to use SQL Always Encrypted with Entity Framework. The sample assumed the SQL Always Encrypted is configured with Azure Key Vault, and a Service Principal has permissions to access the keys in the Key Vault.

The key part of the code is as follows (it is for .NET Core. The code for .NET Framework is quite similar). Not like the sample in the above linked document, the AAD authentication is implemented with MSAL rather than ADAL.

public void ConfigureServices(IServiceCollection services)
 {
     spClientId = Configuration.GetValue("spClientId");
     spClientSecret = Configuration.GetValue("spClientSecret");
     SqlColumnEncryptionAzureKeyVaultProvider azureKeyVaultProvider = new SqlColumnEncryptionAzureKeyVaultProvider(AADAuthenticationCallback); 
     SqlConnection.RegisterColumnEncryptionKeyStoreProviders(new Dictionary<string, SqlColumnEncryptionKeyStoreProvider> 
     {
         { SqlColumnEncryptionAzureKeyVaultProvider.ProviderName, azureKeyVaultProvider }
     }); 
     services.AddDbContext<TodoContext>(
         options => options.UseSqlServer(Configuration.GetConnectionString("TodoDBConnection"))
     );
     
     ...
 }

 private static async Task AADAuthenticationCallback(string authority, string resource, string scope)
 {
     var clientApp = ConfidentialClientApplicationBuilder
         .Create(spClientId)
         .WithClientSecret(spClientSecret)
         .WithAuthority(authority)
         .Build();
     var scopes = new[] { resource + "/.default" };
     var authResult = await clientApp.AcquireTokenForClient(scopes).ExecuteAsync();
     if (authResult == null)
     {
         throw new Exception("Failed to acquire the access token.");
     }

     return authResult.AccessToken;
 }

One thing to note is the package of the AzureKeyVaultProvider for Always Encrypted. For EF Core, the package is Microsoft.Data.SqlClient.AlwaysEncrypted.AzureKeyVaultProvider. But for EF6, Microsoft.SqlServer.Management.AlwaysEncrypted.AzureKeyVaultProvider should be used. That is because Microsoft.Data.SqlClient is not compatible with EF6.

Azure Storage Static Website and Application Gateway Integration

When talking about the custom domain name and SSL for the Azure Storage Static website, ms docs mentioned about using Azure CDN to achieve it. Besides Azure CDN, another option is to use an Application Gateway in front of the storage static website.

To integrate the Azure Storage Static Website with an Application Gateway, the following configurations need to be applied.

  • On the storage account, allow the traffic from the VNET and subnet of the application gateway. Enable the service endpoint, Microsoft.Storage, on the subnet of the application gateway.
  • In the configurations of the application gateway, configure the backend pool as follows:
    • Target type: IP address or FQDN
    • Target: FQDN of the static website. e.g. <web-name>.<zone>.web.core.windows.net
  • Configure the HTTP Settings as follows:
    • Backend protocol: HTTPS
    • Use well known CA certificate: Yes
    • Override with new host name: Yes
    • Override with specific domain name: FQDN of the static website. e.g. <web-name>.<zone>.web.core.windows.net
  • Create a request routing rule with the above backend pool and the http settings.

With the above settings, check the backend health, it should shows the healthy status. And the static website should be accessible through the application gateway.

Use Markdown in Outlook

When I composed emails in Outlook and wanted to include some URL links or code snippets in the email body, I had to click multiple times or adjusted the style of the email body. It was not very efficient. I was hoping that I could use markdown for these kind of content in the email body. But unfortunately, Outlook doesn’t support markdown natively. I could not seem to find a usable add-in or extension which can do it too. So, I created one for myself. Here is a screenshot of it.

Outlook Add-in

It works both on the desktop and in the browser.

In the browser

The source code and the instruction of how to use it are in this GitHub repo. An Office add-in consists of some static web resources such as JavaScript, css and html. It can be hosted with any static web hosting service. There is a document which talks about deploying an add-in to an Azure Storage static website. I choose to host it with the GitHub Pages. There are several advantages with GitHub Pages, such as it’s easy to support both the custom domain and the SSL cert.

It seems Office AppSource doesn’t allow individual developers to submit apps. It requires a Partner account which I don’t have. So to use the add-in, it has to be sideloaded in Outlook.

Implement a Snowflake Id Service with Azure Functions

The snowflake id from twitter is a popular solution for the generation of the distributed ids. It is easy to understand and to implement. There are many implementations in different languages that can be found on the internet. It doesn’t rely on any special infrastructure so is easy to scale.

I created a short demo to implement a snowflake id service with Azure Functions. The code is quite simple. For testing, I deployed the code to 2 different regions behind a Front Door and have 2 instances for the function app of each region. It scales quite well. The id generated is always unique and monotonic.

In the code, I used a table storage to maintain the worker id for each host of the function app. The host id of the function app can be retrieved with the environment variable, WEBSITE_INSTANCE_ID. When the function app is called, the code queries the table to get the worker id for the host. If the worker id doesn’t exist in the table, it creates one. In this way, the scaling of the function app can be handled.

In the first version of the code, I used a static class for the function and put the above table query logic in it. It means that there is at least 1 table query for every function call. But the worker id should not change for a host once it is retrieved. It should only be queried once when the host starts. So in the current version, I leveraged the dependency injection of Azure Functions to make the worker as a singleton service. In this way, it is only initialized once for a host and reused for all function calls to that host.

There are two small issues with DI though. It seems hard to debug the DI code in the startup locally. The breakpoint seems never to be hit, and the logger, as it is mentioned in the document, is not ready to be used in the startup. Not sure how to troubleshoot it effectively. On the other hand, DI is only available for .NET Core. There seems no equivalent for other languages as of now.

The code is just for demo purpose. Use it at your own risk.