Enable Virtual Node on an Existing AKS Cluster

The virtual node can be enabled when you create a new AKS cluster. There are documents talking about how to do it with either the Azure CLI or Azure Portal. Since the virtual node is an AKS add-on, it can be enabled on existing AKS clusters as well, as long as the clusters are using Azure CNI as the network plug-in.

The following is the procedure of how to enable the virtual node for an existing AKS cluster.

1. In the VNET which the AKS cluster is in, create a new subnet. The virtual node is based on Azure Container Instance (ACI). In the scenario of deploying container groups to a VNET, the subnet will be delegated to ACI and therefore can only be used for container groups. So don’t use the subnets that are used by other node pools.

2. Run the following command to enable the virtual node add-on.

az aks enable-addons -n <cluster-name> -g <resource-group-name> \
-a virtual-node --subnet <subnet-name>

3. When the command completes successfully, the virtual node is enabled on the cluster. You should see the virtual node when you use kubectl get nodes. If you check the cluster status in Azure Portal, you should see the virtual node pools is enabled on the Overview page. In the AKS node resource group, a managed identity is created for the ACI connector. And the network profile is also created. You can view it with the command: az network profile list --resource-group <name of aks managed rg>.

In case you cannot see the virtual node after the add-on is enabled, a possible reason is the managed identity for the ACI connector doesn’t have the proper permission to the vnet. It could happen especially when the vnet is not in the node resource group. You can manually grant contributor permission of the vnet to the managed identity.

4. Deploy a pod to the virtual node by using nodeSelector and toleration such as this sample. Follow the steps in the same document to test if the pod works.

5. To remove the virtual node, follow the instructions in remove virtual nodes section of the document. The virtual node also needs to be removed with kubectl delete node virtual-node-aci-linux command. See a sample below.

# Disable the virtual node add-on
az aks disable-addons -a virtual-node -g <resource-group-name> -n <cluster-name>
# Delete the virtual node from the cluster nodes
kubectl delete node virtual-node-aci-linux
# Delete the network profile
MRG=$(az aks show --resource-group <resource-group-name> \
  --name <cluster-name> --query nodeResourceGroup --output tsv)
NPID=$(az network profile list --resource-group $MRG --query '[0].id' --output tsv)
az network profile delete --id $NPID -y

The disable-addon command will simply remove the virtual node add-on from the cluster. It doesn’t drain and delete the virtual node. If there are pods running on the virtual node, those pods would be ended up being in the error state, and the underlying ACI would not be removed as well. It’s better to remove all pods before disabling the add-on.

Automate End-To-End UI Testing for Blazor WebAssembly App using Playwright

When I was developing the Azure Virtual Network Capacity Planner, I had to run the UI testing manually every time when I made some changes. It was a bit troublesome and not very efficient. I’d like to automate all the end-to-end UI testing so that I don’t have to repeat them manually again. Meanwhile I also wanted to try Playwright which is an open source E2E testing tool freshly baked from Microsoft.

However, the Blazor document is very brief regarding to the E2E testing. It doesn’t mention a concrete approach that we can follow to do the E2E testing for Blazor projects. So in this post, I’ll talk about in detail how we can automate E2E UI testing for Blazor WebAssembly with Playwright, and hopefully it can help to narrow the gap.

The Host

Before we can automate the browser to do any tests, we need a web host running in the memory for the site that will be tested. For the Blazor Server project, this article from Gérald Barré talks about how you can host it and test it with Playwright very well. Actually, thanks to Gérald Barré for his excellent work, the main idea of this post is coming from his article as well.

For the Blazor WebAssembly app, we cannot host it directly in the memory because it doesn’t include the necessary server-side components that are needed for a host. The NuGet package Microsoft.AspNetCore.Components.WebAssembly.DevServer helps us debug and test the project locally. However, it is an exe rather than a dll. We cannot reference and use it in a testing project. But thanks to OSS, we can create our own host server based on the source code the DevServer. For example, the following snippet shows a version that I created. The Startup is a copy from DevServer.

public class DevServer
{
     public static IHost BuildWebHost(string[] args) =>
         Host.CreateDefaultBuilder(args)
         .ConfigureHostConfiguration(config =>
         {
             var inMemoryConfiguration = new Dictionary
             {
                 [WebHostDefaults.EnvironmentKey] = "Development",
                 ["Logging:LogLevel:Microsoft"] = "Warning",
                 ["Logging:LogLevel:Microsoft.Hosting.Lifetime"] = "Information",
             };
             config.AddInMemoryCollection(inMemoryConfiguration);
         })
         .ConfigureWebHostDefaults(webBuilder =>  
         {
             webBuilder.UseStaticWebAssets();
             webBuilder.UseStartup<Startup>();
         }).Build();
 }

We can then wire it up as a xUnit fixture and use it to host the Blazor WebAssembly app as a static website. See my code for more details. As the Blazor WebAssembly app has to be hosted as a static website, we need to publish it first and then provide its output folder as the content root in the tests.

Using Playwright

When we have the in-memory web host ready, we can use Playwright to automate the E2E UI testing. You can find all details about how to use PlaywrightSharp in Gérald Barré’s post. I won’t repeat it here.

One of the best parts of Playwright is it supports multiple languages. One of them is Python. With Playwright for Python, we can record the user interactions in the browser and generate the code that can be used in the test project accordingly. And it does not only generate code in Python, but in C# and JavaScript as well. Simply use the command: python -m playwright codegen --target csharp to generate the code in C# and then copy the code to the test project, we can create test cases quickly.

The following is a screencast of running a test case with Playwright headful in slow mo. For a completed test project, please find it in my repo.

Running the testing in the build pipeline

To integrate the E2E testing with the build pipeline, we can simply run dotnet test after the dotnet publish. As the output folder of dotnet publish on the build agent could be different from the one on the local machine, I made the content root configurable in the test project by adding a testsettings.json file. With it, I can run the tests from both the local machine and the build agent.

Azure Virtual Network Capacity Planner

When you implement a landing zone or deploy workloads on Azure, Virtual Network (VNet) is usually the very fundamental Azure resource that you need to plan and deploy first before other resources. Among all other important aspects that you need to plan for a VNet, there is a basic one: the address space of the VNet and its subnets. You need to ensure the VNet has the plenty of the address space for your workloads and the future growth. At the same time, you should also try not to waste IP addresses. When you integrate Azure resources with the VNet, different resources may have different scaling patterns and therefore require different address space for the subnet. So, it’s important to plan the subnets based on the requirements of the resources that you will deploy.

Quite often I got asked by customers about what kind of address space they need to plan for the VNet and subnets to run their workloads. To simplify this planning task, I created a tool, Azure Virtual Network Capacity Planner. With this tool, you create subnets for the Azure resources that you need to deploy. It helps you calculate the address space the subnets need and eventually the address space of the VNet. You can then export the result into the ARM template or CSV file for the actual deployment.

The following is a screenshot of the planner.

Azure Virtual Network Capacity Planner

The VNet planner is built with Blazor. You can find its source code in this repo. Feel free to raise issues or PR if you have any feedback or better idea.

User Groups in Azure API Management

In Azure API Management, there are 3 built-in groups: Administrators, Developers and Guests. These groups are meant for the developer portal to do the authorization for developer accounts. Based on which group a developer account is in, the developer portal controls what APIs the developer can see. The groups have nothing to do with the actual access control of the API endpoints in APIM.

According to this document, the built-in groups are immutable. Their membership is managed by APIM. You can neither add or remove users to them nor modify the groups themselves. The subscription administrators are the members of the Administrator group. It used to be possible to add a user account to the Administrators group by assigning the Api Management Service Contributor role to it. But it is not the case anymore. The users you add in the APIM are the members of Developers group. The unauthenticated users of the developer portal fall under the Guests group.

Besides the built-in groups, there is a built-in Administrator account which is immutable as well. You can neither delete it nor change its properties. Its email address is the one that you input as the Administrator email when you provision the APIM instance. There is no way for you to create or change other properties of this account, such as its first name, last name, or password etc. There is no UI for that. And if you tried to do it via the management API, you would get HTTP 405 Method Not Allow error. So be careful to choose the Administrator email when provisioning the APIM instance.

In the situation where you really have to make changes to the built-in Administrator account, try to contact Azure Support then.

Update:

It appears that the Administrator’s email can be changed through Notification templates > Email settings. However, using this option would cause a short downtime to APIM (Service is being updated… for several minutes). Be careful.

Using SQL Always Encrypted with Entity Framework

I created some sample code to demonstrate how to use SQL Always Encrypted with Entity Framework. The sample assumed the SQL Always Encrypted is configured with Azure Key Vault, and a Service Principal has permissions to access the keys in the Key Vault.

The key part of the code is as follows (it is for .NET Core. The code for .NET Framework is quite similar). Not like the sample in the above linked document, the AAD authentication is implemented with MSAL rather than ADAL.

public void ConfigureServices(IServiceCollection services)
 {
     spClientId = Configuration.GetValue("spClientId");
     spClientSecret = Configuration.GetValue("spClientSecret");
     SqlColumnEncryptionAzureKeyVaultProvider azureKeyVaultProvider = new SqlColumnEncryptionAzureKeyVaultProvider(AADAuthenticationCallback); 
     SqlConnection.RegisterColumnEncryptionKeyStoreProviders(new Dictionary<string, SqlColumnEncryptionKeyStoreProvider> 
     {
         { SqlColumnEncryptionAzureKeyVaultProvider.ProviderName, azureKeyVaultProvider }
     }); 
     services.AddDbContext<TodoContext>(
         options => options.UseSqlServer(Configuration.GetConnectionString("TodoDBConnection"))
     );
     
     ...
 }

 private static async Task AADAuthenticationCallback(string authority, string resource, string scope)
 {
     var clientApp = ConfidentialClientApplicationBuilder
         .Create(spClientId)
         .WithClientSecret(spClientSecret)
         .WithAuthority(authority)
         .Build();
     var scopes = new[] { resource + "/.default" };
     var authResult = await clientApp.AcquireTokenForClient(scopes).ExecuteAsync();
     if (authResult == null)
     {
         throw new Exception("Failed to acquire the access token.");
     }

     return authResult.AccessToken;
 }

One thing to note is the package of the AzureKeyVaultProvider for Always Encrypted. For EF Core, the package is Microsoft.Data.SqlClient.AlwaysEncrypted.AzureKeyVaultProvider. But for EF6, Microsoft.SqlServer.Management.AlwaysEncrypted.AzureKeyVaultProvider should be used. That is because Microsoft.Data.SqlClient is not compatible with EF6.

Azure Storage Static Website and Application Gateway Integration

When talking about the custom domain name and SSL for the Azure Storage Static website, ms docs mentioned about using Azure CDN to achieve it. Besides Azure CDN, another option is to use an Application Gateway in front of the storage static website.

To integrate the Azure Storage Static Website with an Application Gateway, the following configurations need to be applied.

  • On the storage account, allow the traffic from the VNET and subnet of the application gateway. Enable the service endpoint, Microsoft.Storage, on the subnet of the application gateway.
  • In the configurations of the application gateway, configure the backend pool as follows:
    • Target type: IP address or FQDN
    • Target: FQDN of the static website. e.g. <web-name>.<zone>.web.core.windows.net
  • Configure the HTTP Settings as follows:
    • Backend protocol: HTTPS
    • Use well known CA certificate: Yes
    • Override with new host name: Yes
    • Override with specific domain name: FQDN of the static website. e.g. <web-name>.<zone>.web.core.windows.net
  • Create a request routing rule with the above backend pool and the http settings.

With the above settings, check the backend health, it should shows the healthy status. And the static website should be accessible through the application gateway.

Use Markdown in Outlook

When I composed emails in Outlook and wanted to include some URL links or code snippets in the email body, I had to click multiple times or adjusted the style of the email body. It was not very efficient. I was hoping that I could use markdown for these kind of content in the email body. But unfortunately, Outlook doesn’t support markdown natively. I could not seem to find a usable add-in or extension which can do it too. So, I created one for myself. Here is a screenshot of it.

Outlook Add-in

It works both on the desktop and in the browser.

In the browser

The source code and the instruction of how to use it are in this GitHub repo. An Office add-in consists of some static web resources such as JavaScript, css and html. It can be hosted with any static web hosting service. There is a document which talks about deploying an add-in to an Azure Storage static website. I choose to host it with the GitHub Pages. There are several advantages with GitHub Pages, such as it’s easy to support both the custom domain and the SSL cert.

It seems Office AppSource doesn’t allow individual developers to submit apps. It requires a Partner account which I don’t have. So to use the add-in, it has to be sideloaded in Outlook.

Implement a Snowflake Id Service with Azure Functions

The snowflake id from twitter is a popular solution for the generation of the distributed ids. It is easy to understand and to implement. There are many implementations in different languages that can be found on the internet. It doesn’t rely on any special infrastructure so is easy to scale.

I created a short demo to implement a snowflake id service with Azure Functions. The code is quite simple. For testing, I deployed the code to 2 different regions behind a Front Door and have 2 instances for the function app of each region. It scales quite well. The id generated is always unique and monotonic.

In the code, I used a table storage to maintain the worker id for each host of the function app. The host id of the function app can be retrieved with the environment variable, WEBSITE_INSTANCE_ID. When the function app is called, the code queries the table to get the worker id for the host. If the worker id doesn’t exist in the table, it creates one. In this way, the scaling of the function app can be handled.

In the first version of the code, I used a static class for the function and put the above table query logic in it. It means that there is at least 1 table query for every function call. But the worker id should not change for a host once it is retrieved. It should only be queried once when the host starts. So in the current version, I leveraged the dependency injection of Azure Functions to make the worker as a singleton service. In this way, it is only initialized once for a host and reused for all function calls to that host.

There are two small issues with DI though. It seems hard to debug the DI code in the startup locally. The breakpoint seems never to be hit, and the logger, as it is mentioned in the document, is not ready to be used in the startup. Not sure how to troubleshoot it effectively. On the other hand, DI is only available for .NET Core. There seems no equivalent for other languages as of now.

The code is just for demo purpose. Use it at your own risk.

WCF on App Service – Is client certificate authentication possible?

In WCF services, the client certificate authentication, or in WCF term the transport security with certificate authentication, is one of the common ways for authentication. Both of the basicHttpBinding and the wsHttpBinding support it. These bindings rely on IIS to implement the client cert authentication. So usually there are two steps to enable the client cert authentication for a WCF service when deploying it on IIS:

  1. Configure the transport security for the binding of the WCF service, like example below.
<bindings>  
    <wsHttpBinding>  
        <binding>  
            <security mode="Transport">  
                <transport clientCredentialType="Certificate"/>
            </security>  
        </binding>  
    </wsHttpBinding>  
</bindings>  

2. Enable the client certificates on IIS.

But when you deploy the same WCF to Azure App Service and enable the client certificates in the settings of App Service, you may find that it doesn’t work. You would probably see this error: The SSL settings for the service 'SslRequireCert' does not match those of the IIS 'None'. This error means that the client cert is configured for transport security but not configured on IIS.

What is the reason behind? The document, Configure TLS mutual authentication for Azure App Service, tells us the reason.

In App Service, TLS termination of the request happens at the frontend load balancer. When forwarding the request to your app code with client certificates enabled, App Service injects an X-ARR-ClientCert request header with the client certificate. App Service does not do anything with this client certificate other than forwarding it to your app. Your app code is responsible for validating the client certificate.

When you enable the client cert in the settings of App Service, it turns on the client cert auth at the frontend load balancer layer of App Service. The IIS servers which host the WCF service don’t have it enabled. App Service expects the applications to handle and validate the certificate by themselves, which is not what WCF expects. So it results the error.

Now back to our question, is client certificate authentication possible for WCF services running on App Service? Looks like it is not possible with the default HTTP bindings of WCF. To make it possible, you would have to develop your own custom service behaviors and custom bindings to handle and validate the client cert from the custom HTTP header sent by App Service. You would have to customize the bindings and behaviors for both the server and the client.

WCF has ended its journey and is not in the plan of future .NET core anyway. So comparing to develop custom behaviors and bindings, a better option would be to migrate away from WCF to other technologies, like ASP.NET Core + gRPC for example.

Deploy Drupal on Azure VMSS with Ansible

To run Drupal in Azure, we have three options for the infrastructure at the moments, VMs, VMSS and AKS. VMs could be the most straight forward way if you migrate Drupal from on-prem servers. VMSS would be a better option if you want to enjoy the benefit of cloud such as auto scaling. AKS would be the best option if your team is ready to develop and manage containers on Kubernetes.

I put together a set of Ansible playbooks which can be used to deploy a testing environment with Drupal running on Azure VMSS and using Azure Database for MariaDB as the backend database. The code is in this repo. It is just to test and demonstrate the deployment of Drupal on VMSS can be automated with Ansible. It can also be done with ARM template like what this sample shows.

When running the playbooks one by one, the following components will be deployed.

  • A vnet with 2 subnets in a resource group.
  • A 2 nodes GlusterFS cluster as the file storage for Drupal.
  • An Azure Database for MariaDB instance with a database on it.
  • A VMSS with a basic load balancer. The VMs in the VMSS are configured to run GlusterFS client, PHP, Nginx and Drupal sites.

One challenge of the deployment is the file storage for Drupal files. When deploying Drupal on multiple servers, it requires a coherent and synchronized file storage. I considered the Azure Files firstly, but it didn’t work well in VMSS scenario. As the deployment script runs on all VMs in VMSS at the same time, the 1st VM creates a file in the shared storage as a lock. Other VMs wait for the 1st VM to complete. Azure Files seems not fast enough in this case so every VM thinks it is the 1st.

I ended up using GlusterFS for the purpose. It works well but I have to deploy additional VMs and configure the GlusterFS cluster. Hope the coming NFS features of Azure Storage would be able to be used to replace the GlusterFS in the future.