6

I've been trying to manage an Azure Kubernetes Service (AKS) instance via Terraform. When I create the AKS instance via the Azure CLI per this MS tutorial, then install an ingress controller with a static public IP, per this MS tutorial, everything works fine. This method implicitly creates a service principal (SP).

When I create an otherwise exact duplicate of the AKS cluster via Terraform, I am forced to supply the service principal explicitly. I gave this new SP "Contributor" access to the cluster's entire resource group yet, when I get to the step to create the ingress controller (using the same command that tutorial 2 provided, above: helm install stable/nginx-ingress --set controller.replicaCount=2 --set controller.service.loadBalancerIP="XX.XX.XX.XX"), the ingress service comes up but it never acquires its public IP. The IP status remains "<pending>" indefinitely, and I can find nothing in any log about why. Are there logs that should tell me why my IP is still pending?

Again, I am fairly certain that, other than the SP, the Terraform AKS cluster is an exact duplicate of the one created based on the MS tutorial. Running terraform plan finds no differences between the two. Does anyone have any idea what permission my AKS SP might need or what else I might be missing here? Strangely, I can't find ANY permissions assigned to the implicitly created principal via the Azure portal, but I can't think of anything else that might be causing this behavior.

Not sure if it's a red herring or not, but other users have complained about a similar problem in the context of issues opened against the second tutorial. Their fix always appears to be "tear down your cluster and retry", but that isn't an acceptable solution in this context. I need a reproducible working cluster and azurerm_kubernetes_cluster doesn't currently allow for building an AKS instance with an implicitly created SP.

Derek
  • 1,466
  • 15
  • 24

3 Answers3

12

I'm going to answer my own question, for posterity. It turns out the problem was the resource group where I created the static public IP. AKS clusters use two resource groups: the group that you explicitly created the cluster in, and a second group which is implicitly created by the cluster. That second, implicit resource group always gets a name starting with "MC_" (the rest of the name is derivative of the explicit RG, the cluster name, and the region).

Anyhow, the default AKS configuration requires that the public IP be created within that implicit resource group. Assuming that you created the AKS cluster with Terraform, its name will be exported in ${azurerm_kubernetes_cluster.NAME.node_resource_group}.

EDIT 2019-05-23

Since writing this, we found a use case that the workaround of using the MC_* resource group wasn't good enough for. I opened a support ticket with MS and they directed me to this solution. Add the following annotation to your LoadBalancer (or Ingress controller), and make sure that the AKS SP has at least Network Contributor rights in the destination resource group (myResourceGroup in the example below):

metadata:
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-resource-group: myResourceGroup

This solved it completely for us.

Derek
  • 1,466
  • 15
  • 24
  • 1
    thats not exactly true, you can use IP address from any resource group in the same subscription given the right setup – 4c74356b41 May 14 '19 at 19:47
  • Yes, @4c74356b41. I followed up with MS and this requires an annotation on the load balancer or ingress controller. I've edited my original post to include this solution. – Derek May 23 '19 at 12:49
  • Didn't. Changed my mind about an upvote because you didn't direct me to a solution but only reported that there was one, which wasn't very helpful. I only followed up with a support ticket to MS because we found a use case that wasn't solved by my original workaround. – Derek May 23 '19 at 12:53
  • 1
    well, i dont know how you configured your stuff, so I wouldn't know how to fix it, but its quite obvious that everything works fine if configured properly so you need to fix the SP rights – 4c74356b41 May 23 '19 at 12:54
  • The SP rights were fine from the start. That was the first thing I fixed. The problem was a missing annotation. I've edited my answer to include the annotation. Thanks for mentioning the SP rights again, though. I will add that to my answer too. (You get an upvote for that. :) – Derek May 23 '19 at 12:57
2

Set Static IP Resource Group when Installing Helm Chart

Here is a minimal helm install command for nginx-controller that works when the static IP is in a different resource group than the cluster managed node resource group.

helm upgrade --install ingress-nginx ingress-nginx \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx \
  --set controller.replicaCount=1 \
  --set controller.service.externalTrafficPolicy=Local \
  --set controller.service.loadBalancerIP=$ingress_controller_ip \
  --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-resource-group"=$STATIC_IP_ROSOURCE_GROUP

The key is the last override to provide the resource group of the static IP.

Also, note that you may need to customize the load balancer health probe if your root path doesn't return a successful http response. We do this by additionally adding the following (replace /healthz with your probe EP):

Additional Note: Health Probe Endpoints

--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz

Versions

Kubernetes 1.22.6
ingress-nginx-4.1.0
ingress-nginx/controller:v1.2.0
Adrian
  • 251
  • 1
  • 5
1

I can't comment just yet so putting this addition as answer.

Derek is right, you can totally use existing IP from a resource group different to where AKS cluster was provisioned. There is the documentation page. Just make sure you've done these two steps below:

  1. Add "Network Contributor" role assignment for your AKS service principal to the resource group where your existing static IP is.

  2. Add service.beta.kubernetes.io/azure-load-balancer-resource-group: myResourceGroup to the ingress controller with the following command:

kubectl annotate service ingress-nginx-controller -n ingress service.beta.kubernetes.io/azure-load-balancer-resource-group=datagate
Gleb Teterin
  • 41
  • 1
  • 6