WCF on App Service – Is client certificate authentication possible?

In WCF services, the client certificate authentication, or in WCF term the transport security with certificate authentication, is one of the common ways for authentication. Both of the basicHttpBinding and the wsHttpBinding support it. These bindings rely on IIS to implement the client cert authentication. So usually there are two steps to enable the client cert authentication for a WCF service when deploying it on IIS:

  1. Configure the transport security for the binding of the WCF service, like example below.
<bindings>  
    <wsHttpBinding>  
        <binding>  
            <security mode="Transport">  
                <transport clientCredentialType="Certificate"/>
            </security>  
        </binding>  
    </wsHttpBinding>  
</bindings>  

2. Enable the client certificates on IIS.

But when you deploy the same WCF to Azure App Service and enable the client certificates in the settings of App Service, you may find that it doesn’t work. You would probably see this error: The SSL settings for the service 'SslRequireCert' does not match those of the IIS 'None'. This error means that the client cert is configured for transport security but not configured on IIS.

What is the reason behind? The document, Configure TLS mutual authentication for Azure App Service, tells us the reason.

In App Service, TLS termination of the request happens at the frontend load balancer. When forwarding the request to your app code with client certificates enabled, App Service injects an X-ARR-ClientCert request header with the client certificate. App Service does not do anything with this client certificate other than forwarding it to your app. Your app code is responsible for validating the client certificate.

When you enable the client cert in the settings of App Service, it turns on the client cert auth at the frontend load balancer layer of App Service. The IIS servers which host the WCF service don’t have it enabled. App Service expects the applications to handle and validate the certificate by themselves, which is not what WCF expects. So it results the error.

Now back to our question, is client certificate authentication possible for WCF services running on App Service? Looks like it is not possible with the default HTTP bindings of WCF. To make it possible, you would have to develop your own custom service behaviors and custom bindings to handle and validate the client cert from the custom HTTP header sent by App Service. You would have to customize the bindings and behaviors for both the server and the client.

WCF has ended its journey and is not in the plan of future .NET core anyway. So comparing to develop custom behaviors and bindings, a better option would be to migrate away from WCF to other technologies, like ASP.NET Core + gRPC for example.

Deploy Drupal on Azure VMSS with Ansible

To run Drupal in Azure, we have three options for the infrastructure at the moments, VMs, VMSS and AKS. VMs could be the most straight forward way if you migrate Drupal from on-prem servers. VMSS would be a better option if you want to enjoy the benefit of cloud such as auto scaling. AKS would be the best option if your team is ready to develop and manage containers on Kubernetes.

I put together a set of Ansible playbooks which can be used to deploy a testing environment with Drupal running on Azure VMSS and using Azure Database for MariaDB as the backend database. The code is in this repo. It is just to test and demonstrate the deployment of Drupal on VMSS can be automated with Ansible. It can also be done with ARM template like what this sample shows.

When running the playbooks one by one, the following components will be deployed.

  • A vnet with 2 subnets in a resource group.
  • A 2 nodes GlusterFS cluster as the file storage for Drupal.
  • An Azure Database for MariaDB instance with a database on it.
  • A VMSS with a basic load balancer. The VMs in the VMSS are configured to run GlusterFS client, PHP, Nginx and Drupal sites.

One challenge of the deployment is the file storage for Drupal files. When deploying Drupal on multiple servers, it requires a coherent and synchronized file storage. I considered the Azure Files firstly, but it didn’t work well in VMSS scenario. As the deployment script runs on all VMs in VMSS at the same time, the 1st VM creates a file in the shared storage as a lock. Other VMs wait for the 1st VM to complete. Azure Files seems not fast enough in this case so every VM thinks it is the 1st.

I ended up using GlusterFS for the purpose. It works well but I have to deploy additional VMs and configure the GlusterFS cluster. Hope the coming NFS features of Azure Storage would be able to be used to replace the GlusterFS in the future.

大选2020

新加坡2020年的大选尘埃落定,已经过去一周了。这已经是我在新加坡经历的第三次大选了。第一次经历新加坡大选是2011年,那时我们刚来新加坡,还是外国人身份,没有投票权,纯粹是看客。第二次是2015年,我们刚刚成为公民,第一次可以投票,对大选关注的多了一些。而这第三次,在2020年这样一个特殊的年份举行的大选,感受与前两次有许多不同。

选举的结果,人民行动党的成绩差强人意,虽然当选83席,继续以超过三分之二多数执政,但得票率只有61%多,2015年大幅下降,只比2011年的60%的稍高,也继2011年之后,再丢失一个集选区。更有甚者,在几个大的集选区,行动党都是低空飞过。即使是第四代总理接班人王瑞杰,在东海岸集选区也差点翻车。新加坡最大的反对党工人党,则继续高歌猛进,增加了4名选取议员和一个集选区。也许是人年纪大了难免变的保守,这样的结果对未来新加坡的政治走向,不知是福是祸。

选前,反对党质疑执政党选择在疫情期间举行选举,是想利用危机取得大胜。工人党秘书长更警告大家,反对党有全军覆没的危险。我到觉得,也许执政党已预见到疫情会久拖不决,经济形势会继续恶化,大选拖越久,对执政党反而会越不利。因此选择二季度经济数据出炉之前进行大选,虽然结果不如预期,行动党并没有表现的非常失望。

选后的一个热门话题是,新加坡未来是否会走向两党制?看起来这是无法避免的趋势。新加坡的年轻人成长在富足的环境,没经历过贫困与动荡。承平日久,人心思变,行动党的选票基础只会越来越薄弱。这次大选的结果,更是给所有有意从政的人发出了一个信号,如果想施展自己的政治抱负,不一定要加入行动党,加入反对党反而说不定是更快速的捷径,就像工人党的Jamus Lim和Raaesah Khan一样。国会多元似乎是民主的进步,但新加坡人同样需要认真思考的是,在国会多元的同时,如何才能避免党派政治的弊端?如何避免整个社会被撕裂分化,变得对立?毕竟无论是老牌的民主国家英国美国,还是华人民主典范的台湾香港,近些年来无不被党派政治和民粹主义所困扰。新加坡有什么理由相信,新加坡的民主在反对党发展壮大之后,不会遇到一样的问题?在这样一个越来越混乱的世界,新加坡又是否经得起党派政治的折腾?

选举结果也再再说明,行动党所依靠的那套以部长坐镇集选区的选举策略,已经失效了。尤其是选前临时空降集选区,即使是王瑞杰,如果不是未来总理人选的身份,恐怕也会落选。反观行动党几个单选区高票当选的议员,陈佩玲2011年初次参选时备受质疑,如今已经两度成功捍卫自己的选区,得票率更达到71.74%的新高。孙雪玲的选区初次划为单选区,她连任之后有居民受访时说,她的选区服务做的好,当选是意料中事。工人党的选区也很牢固。这都说明平时的选区耕耘,比大选时临时抱佛脚,管用的多。

My Docs Site

Sometimes when I want to write some technical posts or articles, I cannot find a proper place to post them. I know I can post them on this blog and I have been doing it all the time. But sometimes when you have a series of posts about a topic, especially when you want to keep those posts alive, have them versioned and update them from time to time, the blog post is not a very good way to organize them. For these kind of posts, writing them in the markdown format and keeping them in a git repo would be a more natural choice, like what azure-docs does. I don’t use markdown for this blog site because I quite like the Gutenberg editor of WordPress. It is quite good for the casual writings. WordPress also doesn’t have the versioning feature.

So I decided to create my own docs site. It is a static site built with mdbook and hosted on GitHub with GitHub Pages. mdbook is a tool to create online books from markdown files. It is written in Rust. Why I choose to use it is because it is just one single binary file and quite easy to use. I can easily use it in GitHub actions to build the site automatically.

I don’t know what I am going to put on my docs site. A rough idea is it would be for technical, especially Azure related, posts or articles I write, something not proper for a blog. The first one on the site is a series of tutorials which talks about step by step integration of Application Gateway, API Management and self-hosted gateway in an internal virtual network. It consists of multiple parts and is organized into sections. It is too heavy to be posted as blog posts.

So if I write something similar in the future, I will post it there.

sttf-url-generator — A Chrome/Edge Extension

In the past few days, I created a small Chrome/Edge extension, sttf-url-generator, and published it to both Chrome web store and Edge Add-ons. What this extension does is merely the user sharing use case that is described in the text fragment spec.

With text fragments, browsers may implement an option to ‘Copy URL to here’ when the user opens the context menu on a text selection. The browser can then generate a URL with the text selection appropriately specified, and the recipient of the URL will have the specified text conveniently indicated.

The idea of creating this extension was originated from an internal discussion with my colleagues. As customer engineers, we need to share document links with customers quite often. Quoting the text from the document directly in the email could make the email awfully long, while just sending a link may not help customers quickly locate the information in a long document. We need a better way to share the information so that the customer can get it quickly and accurately.

Text fragment is a perfect solution for this case. It is a new feature supported by browsers natively, and it is available in the latest version of Chrome and Edge. We just need a way to easily generate a url with the text fragment. Browsers might provide such a feature in the future, but before that I hope this extension can help to address the needs.

The implementation of the extension is quite simple and easy, just several lines of javascript. I don’t use any framework or libraries, and purely rely on the DOM when I need to deal with HTML elements. Node and webpack are used simply because I’d like to use eslint to sanitize the code and babel to minimize the js files.

The more interesting part is about packaging and submitting it to Chrome and Edge. To meet the requirements of these stores, I have to consider the permissions required by the extension carefully, create the images and assets, setup a home page with GitHub Pages, and even write a privacy policy by myself. These tasks took more time than the coding itself, but it’s a good experience of bringing an idea to production. I learned a lot from it.

The extension currently consists of 3 functions: copy the generated url, open the generated url in a new tab, and copy the text fragment and the generated url in the markdown format. I think these 3 features are more than enough for a single purpose extension.

Hope you enjoy it.

Node Allocatable in AKS

Kubernetes’ Node Allocatable feature allows the cluster to reserve the resources of node for system daemons of OS and Kubernetes itself. For example, when I ran kubectl describe node for a node in my AKS cluster, I got the following capacity related output. The size of this node was Standard DS2 v2 which had 2 CPU cores and 7GB memory.

Capacity:
  attachable-volumes-azure-disk:  8
  cpu:                            2
  ephemeral-storage:              101445900Ki
  hugepages-1Gi:                  0
  hugepages-2Mi:                  0
  memory:                         7113660Ki
  pods:                           110
Allocatable:
  attachable-volumes-azure-disk:  8
  cpu:                            1900m
  ephemeral-storage:              93492541286
  hugepages-1Gi:                  0
  hugepages-2Mi:                  0
  memory:                         4668348Ki
  pods:                           110

From the output, out of 2 cores and 7GB memory, there were 1900 millicores and 66% memory (4668348/7113660) allocatable to pods. The details about how these numbers were calculated are described in this document. In short, the configuration of node allocatable in AKS is as follows:

CPU

CPU cores on host1248163264
Kube-reserved (millicores)60100140180260420740

Memory

  • eviction-hard: 750Mi, which is the default configuration of the upstream aks-engine.
  • kube-reserved: regressive rate
    • 25% of the first 4 GB of memory
    • 20% of the next 4 GB of memory (up to 8 GB)
    • 10% of the next 8 GB of memory (up to 16 GB)
    • 6% of the next 112 GB of memory (up to 128 GB)
    • 2% of any memory above 128 GB

However, the above memory reservation doesn’t seem to applicable to Windows nodes. The following was the output when I ran the kubectl describe node for a Windows node. More memory was reserved for Windows nodes.

Capacity:
  attachable-volumes-azure-disk:  8
  cpu:                            2
  ephemeral-storage:              133703676Ki
  memory:                         7339572Ki
  pods:                           30
Allocatable:
  attachable-volumes-azure-disk:  8
  cpu:                            1900m
  ephemeral-storage:              133703676Ki
  memory:                         3565108Ki
  pods:                           30

The document doesn’t have any information regarding Windows node. GKE reserves approximately 1.5 times more resources on Windows Server nodes. Not sure if it’s the same for AKS. I’ve opened an issue to ask for further information.

This post provides a good comparison of reserved resources for 3 major cloud offerings: AKS, GKE and EKS.

PowerToys for Windows 10

I’ve been using Microsoft PowerToys on my work machine for some time and find myself more and more rely on it every day. The name, PowerToys, is a very old name on Windows. The 1st version of PowerToys came with Windows 95 more than two decades ago. It’s actually a fantastic idea to build new the set of productivity tools on top of a legacy brand. It feels like the good old days were finally connected with the new era, and the history continues.

I cannot remember exactly when I used PowerToys for Windows last time. It could be the time around Windows 98 when I was still in college, long time ago. Honestly, I was not a fan of it at that time. Although I’ve forgotten why I didn’t like it, it was not something that I must install on my machine. But PowerToys for Windows 10 has changed my mind and it has made its way to my must-installed software list.

The two features that I used most in PowerToys are FancyZones and File Explorer Preview. As I’m using a 4k monitor, FancyZones helped me to better utilize the space of the monitor. It made me feel like the laptop screen becomes redundant that I turned it off most of the time. File Explorer Preview is an add-on of Windows file explorer. I can preview file content directly in Windows explorer with it. The best part is it supports markdown preview. Now I can read those README files without even opening them with markdown editors.

Another tool that I am going to use frequently is PowerToys Run which was just released with version 0.18. I always wanted to have such a tool to run apps quickly and I’ve already tried some 3rd tools. PowerToys Run looks quite promising.

The PowerToys is quite stable. I didn’t hit any problem with it since I started using it. So if you are on Windows 10 and haven’t tried it, maybe it’s the time. 🙂

Return HTTP 405 When HTTP Method Not Match in Azure API Management

In Azure API Management, when the HTTP method of a request doesn’t match the one defined in the corresponding operation, APIM returns status code HTTP 404 Resource Not Found to the client. For example, the OOTB Echo API defined the Retrieve resource (cached) operation with HTTP GET. If you call it with HTTP POST, you’ll get HTTP 404 Resource Not Found in the response.

The HTTP 404 returned by APIM in this scenario doesn’t really follow the definition of HTTP status code strictly. According to the definition, HTTP 405 Method Not Allowed is designated for this situation. There was a feedback for this issue to APIM team and it would be addressed in the future according to the response. But before that, we have to use some workaround to bypass this issue. Here is how you can do it with policies in APIM.

Handle the error

When APIM failed to identify an API or operation, it will raise a configuration error which will returns HTTP 404. What we need to do is to handle this error and change the status code to HTTP 405. In this way, you avoid the overhead of creating operations for each of the HTTP methods to handle the situation. The next question is on which scope the error should be handled. Depending on the configurations of your APIM, I think you can handle the error on either all operations or all APIs.

The policy code

The following policy code is a sample for Echo API all operations.

<on-error>
    <base />
    <choose>
        <when condition="@(context.LastError.Source == "configuration" && context.Request.Url.Path == "/echo/resource-cached")">
            <return-response>
                <set-status code="405" reason="Method not allowed" />
                <set-body>@{
                    return new JObject(
                        new JProperty("status", "HTTP 405"),
                        new JProperty("message", "Method not allowed")
                    ).ToString();
                }</set-body>
            </return-response>
        </when>
        <otherwise />
    </choose>
</on-error>

The tricky part is in the <when> condition. The first part of the condition is to check if this is a configuration error. If it is, the second part will test if the request is to Retrieve resource (cached) operation. The second test is to avoid the situation where a real HTTP 404 happens.

You may wonder why I used context.Request rather than context.Operation to test which operation it is. The reason is APIM sets context.Operation to null in this case because it cannot identify which operation it is (and that is why the configuration error happens).

You can use this walkaround to return HTTP 405 until APIM fixes its behavior.

十九年

刚才在Linkedin上看到有人发了一个帖子,庆祝SharePoint的第19个生日。SharePoint的创始人之一,Jeff Teper在转发的时候说,明年大家一起出席庆祝20年。我忽然有些小小的感触。

SharePoint是我加入微软之后所专注的第一个服务器端产品。前前后后差不多有十年的时间,我都是围绕它展开工作的,也因此对它相当有感情。很多年前,我曾经写了几篇blog,介绍SharePoint的早期历史(SharePoint简史I, II, III)。2011年,我放弃了春节休假,跑去雷德蒙德参加了当时SharePoint的最高级别证书,Microsoft Certified Master for SharePoint,的培训和考试。我甚至一度以为,我的职业生涯会一直伴随SharePoint发展了。

但是到了2015年,当微软开始真正向云服务转型的时候,我忽然发现之前积累的SharePoint经验似乎没有了用武之地。当SharePoint作为Office 365的一部分转型为SaaS类型的云服务后,SharePoint顾问对于客户的价值大大降低了。SaaS是即插即用的,客户不再需要部署和管理本地服务器,不用操心数据库和存储的性能,更不需要知道SharePoint运维的最佳实践。SaaS的可定制性也大大缩减,客户不再能够将SharePoint作为一个开发平台,来开发各式各样的应用了。那时我意识到,我与SharePoint的缘分,尽了。当公司转型的时候,我也该开始转型了。

最近几年,我已经没在做SharePoint相关的项目,也没有关注过SharePoint的进展了。我的工作重心已经转移到了Azure上。以前在SharePoint上累计的关于web服务和数据库的开发经验,仍能不断应用在Azure的项目上。SharePoint作为Office 365的核心服务之一,应该会发展的很好,但我应该不会参加它的20周年生日会了。

Azure Batch – Create a Custom Pool using Shared Image Gallery

If you have a custom image in a Shared Image Gallery and you want to use it to create a pool in Azure Batch, this document, Use the Shared Image Gallery to create a custom pool, provides a pretty good guidance for it. I followed it to test the scenario and hit two minor issues.

  • As it is mentioned in another doc, AAD authentication is also a prerequisite for using shared image gallery. If you use --shared-key-auth with az batch account login, you would hit an anthentication error with Azure Cli. I raised an issue for the document and hopefully a note will be added to it.
  • There is no sample code to demonstrate how to create a pool with shared image gallery with Python.

So I wrote a simple sample in Python. It is based on the latest version (9.0.0) of Azure Batch package for Python. And it uses a service principal for the AAD authentication. The custom image I used for test was built on top of Ubuntu 18.04-LTS. So the node agent sku is ubuntu 18.04. It needs to be changed accordingly if other os version is used.

# Import the required modules from the
# Azure Batch Client Library for Python
import azure.batch._batch_service_client as batch
import azure.batch.models as batchmodels
from azure.common.credentials import ServicePrincipalCredentials

# Specify Batch account credentials
account = "<batch-account-name>"
batch_url = "<batch-account-url>"
ad_client_id = "<client id of the SP>"
ad_tenant = "<tenant id>"
ad_secret = "<secret of the SP>"

# Pool settings
pool_id = "LinuxNodesSamplePoolPython"
vm_size = "STANDARD_D2_V3"
node_count = 1

# Initialize the Batch client
creds = ServicePrincipalCredentials(
    client_id=ad_client_id,
    secret=ad_secret,
    tenant=ad_tenant,
    resource="https://batch.core.windows.net/"
)
config = batch.BatchServiceClientConfiguration(creds, batch_url)
client = batch.BatchServiceClient(creds, batch_url)

# Create the unbound pool
new_pool = batchmodels.PoolAddParameter(id=pool_id, vm_size=vm_size)
new_pool.target_dedicated = node_count

# Configure the start task for the pool
start_task = batchmodels.StartTask(
    command_line="printenv AZ_BATCH_NODE_STARTUP_DIR"
)
start_task.run_elevated = True
new_pool.start_task = start_task

# Create an ImageReference which specifies the Marketplace
# virtual machine image to install on the nodes.
ir = batchmodels.ImageReference(
    virtual_machine_image_id="<resource id of the image version in sig>"
)

# Create the VirtualMachineConfiguration, specifying
# the VM image reference and the Batch node agent to
# be installed on the node.
vmc = batchmodels.VirtualMachineConfiguration(
    image_reference=ir,
    node_agent_sku_id="batch.node.ubuntu 18.04"
)

# Assign the virtual machine configuration to the pool
new_pool.virtual_machine_configuration = vmc

# Create pool in the Batch service
client.pool.add(new_pool)

Update: I polished the above sample code and pushed it into the document I mentioned at the beginning of this post via a PR. The Python sample code in that document is based on the one in this post.