Enable Virtual Node on an Existing AKS Cluster

The virtual node can be enabled when you create a new AKS cluster. There are documents talking about how to do it with either the Azure CLI or Azure Portal. Since the virtual node is an AKS add-on, it can be enabled on existing AKS clusters as well, as long as the clusters are using Azure CNI as the network plug-in.

The following is the procedure of how to enable the virtual node for an existing AKS cluster.

1. In the VNET which the AKS cluster is in, create a new subnet. The virtual node is based on Azure Container Instance (ACI). In the scenario of deploying container groups to a VNET, the subnet will be delegated to ACI and therefore can only be used for container groups. So don’t use the subnets that are used by other node pools.

2. Run the following command to enable the virtual node add-on.

az aks enable-addons -n <cluster-name> -g <resource-group-name> \
-a virtual-node --subnet <subnet-name>

3. When the command completes successfully, the virtual node is enabled on the cluster. You should see the virtual node when you use kubectl get nodes. If you check the cluster status in Azure Portal, you should see the virtual node pools is enabled on the Overview page. In the AKS node resource group, a managed identity is created for the ACI connector. And the network profile is also created. You can view it with the command: az network profile list --resource-group <name of aks managed rg>.

In case you cannot see the virtual node after the add-on is enabled, a possible reason is the managed identity for the ACI connector doesn’t have the proper permission to the vnet. It could happen especially when the vnet is not in the node resource group. You can manually grant contributor permission of the vnet to the managed identity.

4. Deploy a pod to the virtual node by using nodeSelector and toleration such as this sample. Follow the steps in the same document to test if the pod works.

5. To remove the virtual node, follow the instructions in remove virtual nodes section of the document. The virtual node also needs to be removed with kubectl delete node virtual-node-aci-linux command. See a sample below.

# Disable the virtual node add-on
az aks disable-addons -a virtual-node -g <resource-group-name> -n <cluster-name>
# Delete the virtual node from the cluster nodes
kubectl delete node virtual-node-aci-linux
# Delete the network profile
MRG=$(az aks show --resource-group <resource-group-name> \
  --name <cluster-name> --query nodeResourceGroup --output tsv)
NPID=$(az network profile list --resource-group $MRG --query '[0].id' --output tsv)
az network profile delete --id $NPID -y

The disable-addon command will simply remove the virtual node add-on from the cluster. It doesn’t drain and delete the virtual node. If there are pods running on the virtual node, those pods would be ended up being in the error state, and the underlying ACI would not be removed as well. It’s better to remove all pods before disabling the add-on.