0

When deploying new jobs and services to Azure Kubernetes Service cluster, the pods fail to request valid AAD access tokens with all permissions available. If new permissions were added on the same day, before or after a deployment, the tokens still do not pick them up. This issue has been observed with permissions granted to Active Directory Groups over Key Vaults, Storage Accounts, and SQL databases scopes so far.

Example: I have a .NET 5.0 C# API running on 3 pods with antiaffinity rules located each on a separate node. The application reads information from a SQL database. I made a release and added the database permissions afterwards. Things I have tried so far to make the application reset the access tokens:

kubectl delete pods --all -n <namespace> which essentially created 3 new pods again failing due to insufficient permissions.

kubectl apply -f deployment.yaml to deploy a new version of the image running in the containers, again all 3 pods kept failing.

kubectl delete -f deployment.yaml followed by kubectl apply -f deployment.yaml to erase the old kubernetes object and create a new one. This resolved the issue on 2/3 pods, however, the third one kept failing due to insufficient permissions.

kubectl delete namespace <namespace> to erase the entire namespace with all configuration available and recreated it again. Surprisingly, again 2/3 pods were running with the correct permissions and the last one did not.

The commands were ran more than one hour after the permissions were added to the group. The database tokens are active for 24 hours and when I have seen this issue occur with cronjobs, I had to wait 1 day for the task to execute correctly (none of the above steps worked in a cronjob scenario). The validity of the tokens kept changing which implied that the pods are requesting new access tokens, again excluding the most recently added permissions. The only solution I have found that works 100% of the time is destroy the cluster and recreate it which is not viable in any production scenario.

The failing pod from my example was the one always running on node 00 which made me think there may be an extra caching layer on the first initial node of the cluster. However, I still do not understand why the other 2 pods were running with no issue and also what is the way to restart my pods or refresh the access token to minimise the wait time until resolution.

Kubernetes version: 1.21.7. The cluster has no AKS-managed AAD or pod-identity enabled. All RBAC is granted to the cluster MSI via AD groups.

1 Answers1

0

Please check if below can be worked around in your case.

To access the Kubernetes resources, you must have access to the AKS cluster, the Kubernetes API, and the Kubernetes objects. Ensure that you're either a cluster administrator or a user with the appropriate permissions to access the AKS cluster Things you need to do, if you haven't already:

  1. Enable Azure RBAC on your existing AKS cluster, using:

    az aks update -g myResourceGroup -n myAKSCluster --enable-azure-rbac

Create Role that allows read access to all other Pods and Services: Add the necessary roles (Azure Kubernetes Service Cluster User Role , Azure Kubernetes Service RBAC Reader/Writer/Admin/Cluster Admin) to the user. See ( Microsoft Docs).

Also check Troubleshooting

  1. Also check if you need to have "Virtual Machine Contributor" and storage account contributer for your resource group containing pods and see if namespace is mentioned in that pod , if you have missed . Stack Overflow refernce.Also do check if firewall is restricting the access to the network in that pod.

Resetting the kubeconfig context using the az aks get-credentials command may clear the previously cached authentication token for some xyz user:

az aks get-credentials --resource-group myResourceGroup --name myAKSCluster --overwrite-existing >Reference

Please do check Other References below:

  1. kubernetes - Permissions error - Stack Overflow
  2. create-role-assignments-for-users-to-access-cluster | microsoft docs
  3. user can't access to AKS cluster with RBAC enabled (github.com)
  4. kubernetes - Stack Overflow
kavyaS
  • 8,026
  • 1
  • 7
  • 19