Problem
Terraform GCP google_service_account and google_project_iam_binding resource to attach roles/editor
deleted Google APIs Service Agent and GCP default compute engine default service account in the IAM principals. GKE cluster cannot be deleted / created due to the deletion in IAM principals, although it still remains in IAM Service Accounts.
The problem here is it disappears (which I wrote "deleted") from the IAM principals, and the Compute Engine default service account is compromised, hence no more able to manage Compute Engine, including GKE cluster/nodes.
Question
I believe this is a Terraform bug but please help understand if there are things I am missing which can prevent the problem.
Please also advise if there is a way to restore the Compute Engine default service account back in IAM principals with the Editor role.
Environment
$ terraform version
Terraform v1.0.4
on linux_amd64
+ provider registry.terraform.io/hashicorp/google v4.6.0
.terraform.lock.hcl
# This file is maintained automatically by "terraform init".
# Manual edits may be lost in future updates.
provider "registry.terraform.io/hashicorp/google" {
version = "4.6.0"
hashes = [
"h1:QbO4yjDrnoSpiYKSHrICNL1ZuWsl5J2rVRFj2kNg7xA=",
"zh:005a28a2c79f6b29680b0f57260c69c85d8a992688007b6e5645149bd379951f",
"zh:2604d825de72cf99b4899d7880837adeb19d371f48e419666e32c4c3cf6a72e9",
"zh:290da4eb18e44469480cf299bebce89f54e4d301f856cdffe2837b498878c7ec",
"zh:3e5ba1a55d38fa17533a18fc14a612e781ded76c6309734d3dc0a937be27eec1",
"zh:4a85de3cdb33c092d8ccfced3d7302934de0dd4f72bbcebd79d45afe0a0b6f85",
"zh:5fb1a79800833ae922aaba594a8b2bc83be1d254052e12e0ce8330ca0d8933d9",
"zh:679b9f50c6fe0476e74d37935f7598d46d6e9612f75b26a8ef1ca3c13144d06a",
"zh:893216e32378839668c51ef135af1676cd887d63e2edb6625cf9adad7bfa346f",
"zh:ad8f2fd19adbe4c10281ba9b3c8d5100877a9c541d3580bbbe9357714aa77619",
"zh:bff5d6fd15e98c12ee9ed98b0338761dc4a9ba671a37834926daeabf73c71783",
"zh:debdf15fbed8d63e397cd004bf65586bd2b93ce04e47ca51a7c70c1fe9168b87",
]
}
Reproduction Steps
Tested twice in different GCP projects and the issue was reproduced in the same manner.
Start
In a GCP project, starts without Compute Engine enabled, hence no Compute Engine default service account.
Enable Compute Engine API.
Compute Engine default service account gets created and appears both in IAM Principals and IAM Service Accounts.
Terraform apply
Apply the terraform script to create a service account with IAM bindings.
variable "PROJECT_ID" {
type = string
description = "GCP Project ID"
default = "test-tf-sa"
}
variable "REGION" {
type = string
description = "GCP Region"
default = "us-central1"
}
variable "roles_to_grant_to_service_account" {
description = "IAM roles to grant to the service account"
type = list(string)
default = [
"roles/editor",
"roles/iam.serviceAccountAdmin",
"roles/resourcemanager.projectIamAdmin"
]
}
provider "google" {
project = var.PROJECT_ID
region = var.REGION
}
resource "google_service_account" "terraform" {
account_id = "terraform"
display_name = "terraform service account"
}
resource "google_project_iam_binding" "terraform" {
project = var.PROJECT_ID
#--------------------------------------------------------------------------------
# Grant the service account to have the roles
#--------------------------------------------------------------------------------
members = [
"serviceAccount:${google_service_account.terraform.email}"
]
for_each = toset(var.roles_to_grant_to_service_account)
role = each.value
}
$ terraform apply --auto-approve
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# google_project_iam_binding.terraform["roles/editor"] will be created
+ resource "google_project_iam_binding" "terraform" {
+ etag = (known after apply)
+ id = (known after apply)
+ members = (known after apply)
+ project = "test-tf-sa"
+ role = "roles/editor"
}
# google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"] will be created
+ resource "google_project_iam_binding" "terraform" {
+ etag = (known after apply)
+ id = (known after apply)
+ members = (known after apply)
+ project = "test-tf-sa"
+ role = "roles/iam.serviceAccountAdmin"
}
# google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"] will be created
+ resource "google_project_iam_binding" "terraform" {
+ etag = (known after apply)
+ id = (known after apply)
+ members = (known after apply)
+ project = "test-tf-sa"
+ role = "roles/resourcemanager.projectIamAdmin"
}
# google_service_account.terraform will be created
+ resource "google_service_account" "terraform" {
+ account_id = "terraform"
+ disabled = false
+ display_name = "terraform service account"
+ email = (known after apply)
+ id = (known after apply)
+ name = (known after apply)
+ project = (known after apply)
+ unique_id = (known after apply)
}
Plan: 4 to add, 0 to change, 0 to destroy.
google_service_account.terraform: Creating...
google_service_account.terraform: Creation complete after 2s [id=projects/test-tf-sa/serviceAccounts/terraform@test-tf-sa.iam.gserviceaccount.com]
google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"]: Creating...
google_project_iam_binding.terraform["roles/editor"]: Creating...
google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"]: Creating...
google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"]: Creation complete after 9s [id=test-tf-sa/roles/iam.serviceAccountAdmin]
google_project_iam_binding.terraform["roles/editor"]: Creation complete after 9s [id=test-tf-sa/roles/editor]
google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"]: Still creating... [10s elapsed]
google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"]: Creation complete after 10s [id=test-tf-sa/roles/resourcemanager.projectIamAdmin]
Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
Terraform has deleted the Compute Engine default service account from the IAM principals
Immediately after the terraform apply, verify the IAM principals and the Compute Engine default service account has been deleted in the IAM principal view.
As suggested by @JohnHanley, clicked Include Google-provided role grants to unhide Google-managed service accounts. The original Compute Engine default service account 1079157603081-compute@developer.gserviceaccount.com has gone in the IAM principals view.
The gcloud projects get-iam-policy
command does not show the Compute Engine default service account 1079157603081-compute@developer.gserviceaccount.com.
$ GCP_PROJECT_ID=test-tf-sa
$ gcloud projects get-iam-policy $GCP_PROJECT_ID
bindings:
- members:
- serviceAccount:service-1079157603081@compute-system.iam.gserviceaccount.com
role: roles/compute.admin
- members:
- serviceAccount:service-1079157603081@compute-system.iam.gserviceaccount.com
role: roles/compute.instanceAdmin
- members:
- serviceAccount:service-1079157603081@compute-system.iam.gserviceaccount.com
role: roles/compute.serviceAgent
- members:
- serviceAccount:service-1079157603081@container-engine-robot.iam.gserviceaccount.com
role: roles/container.serviceAgent
- members:
- serviceAccount:service-1079157603081@containerregistry.iam.gserviceaccount.com
role: roles/containerregistry.ServiceAgent
- members:
- serviceAccount:service-1079157603081@compute-system.iam.gserviceaccount.com
role: roles/editor
- members:
- user:****@gmail.com
role: roles/owner
- members:
- serviceAccount:service-1079157603081@gcp-sa-pubsub.iam.gserviceaccount.com
role: roles/pubsub.serviceAgent
etag: BwXVf2S5fCQ=
version: 1
The service account though still remains in the IAM Service Accounts menu.
Create GKE
Enable the Kubernetes Engine API, and create a GKE cluster. At this point, the impact of Compute Engine default service account did not hinder the GKE creation. It may be because of the eventual consistency.
terraform destroy
Run terraform destroy.
$ terraform destroy --auto-approve
google_service_account.terraform: Refreshing state... [id=projects/test-tf-sa/serviceAccounts/terraform@test-tf-sa.iam.gserviceaccount.com]
google_project_iam_binding.terraform["roles/editor"]: Refreshing state... [id=test-tf-sa/roles/editor]
google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"]: Refreshing state... [id=test-tf-sa/roles/iam.serviceAccountAdmin]
google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"]: Refreshing state... [id=test-tf-sa/roles/resourcemanager.projectIamAdmin]
Note: Objects have changed outside of Terraform
Terraform detected the following changes made outside of Terraform since the last "terraform apply":
# google_project_iam_binding.terraform["roles/editor"] has been changed
~ resource "google_project_iam_binding" "terraform" {
~ etag = "BwXVe+z+aCU=" -> "BwXVfBieTDw="
id = "test-tf-sa/roles/editor"
~ members = [
+ "serviceAccount:1079157603081@cloudservices.gserviceaccount.com",
# (1 unchanged element hidden)
]
# (2 unchanged attributes hidden)
}
# google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"] has been changed
~ resource "google_project_iam_binding" "terraform" {
~ etag = "BwXVe+z+aCU=" -> "BwXVfBieTDw="
id = "test-tf-sa/roles/iam.serviceAccountAdmin"
# (3 unchanged attributes hidden)
}
# google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"] has been changed
~ resource "google_project_iam_binding" "terraform" {
~ etag = "BwXVe+z+aCU=" -> "BwXVfBieTDw="
id = "test-tf-sa/roles/resourcemanager.projectIamAdmin"
# (3 unchanged attributes hidden)
}
Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to
undo or respond to these changes.
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
- destroy
Terraform will perform the following actions:
# google_project_iam_binding.terraform["roles/editor"] will be destroyed
- resource "google_project_iam_binding" "terraform" {
- etag = "BwXVfBieTDw=" -> null
- id = "test-tf-sa/roles/editor" -> null
- members = [
- "serviceAccount:1079157603081@cloudservices.gserviceaccount.com",
- "serviceAccount:terraform@test-tf-sa.iam.gserviceaccount.com",
] -> null
- project = "test-tf-sa" -> null
- role = "roles/editor" -> null
}
# google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"] will be destroyed
- resource "google_project_iam_binding" "terraform" {
- etag = "BwXVfBieTDw=" -> null
- id = "test-tf-sa/roles/iam.serviceAccountAdmin" -> null
- members = [
- "serviceAccount:terraform@test-tf-sa.iam.gserviceaccount.com",
] -> null
- project = "test-tf-sa" -> null
- role = "roles/iam.serviceAccountAdmin" -> null
}
# google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"] will be destroyed
- resource "google_project_iam_binding" "terraform" {
- etag = "BwXVfBieTDw=" -> null
- id = "test-tf-sa/roles/resourcemanager.projectIamAdmin" -> null
- members = [
- "serviceAccount:terraform@test-tf-sa.iam.gserviceaccount.com",
] -> null
- project = "test-tf-sa" -> null
- role = "roles/resourcemanager.projectIamAdmin" -> null
}
# google_service_account.terraform will be destroyed
- resource "google_service_account" "terraform" {
- account_id = "terraform" -> null
- disabled = false -> null
- display_name = "terraform service account" -> null
- email = "terraform@test-tf-sa.iam.gserviceaccount.com" -> null
- id = "projects/test-tf-sa/serviceAccounts/terraform@test-tf-sa.iam.gserviceaccount.com" -> null
- name = "projects/test-tf-sa/serviceAccounts/terraform@test-tf-sa.iam.gserviceaccount.com" -> null
- project = "test-tf-sa" -> null
- unique_id = "107173424725895843752" -> null
}
Plan: 0 to add, 0 to change, 4 to destroy.
google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"]: Destroying... [id=test-tf-sa/roles/resourcemanager.projectIamAdmin]
google_project_iam_binding.terraform["roles/editor"]: Destroying... [id=test-tf-sa/roles/editor]
google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"]: Destroying... [id=test-tf-sa/roles/iam.serviceAccountAdmin]
google_project_iam_binding.terraform["roles/resourcemanager.projectIamAdmin"]: Destruction complete after 10s
google_project_iam_binding.terraform["roles/iam.serviceAccountAdmin"]: Destruction complete after 10s
google_project_iam_binding.terraform["roles/editor"]: Still destroying... [id=test-tf-sa/roles/editor, 10s elapsed]
google_project_iam_binding.terraform["roles/editor"]: Destruction complete after 11s
google_service_account.terraform: Destroying... [id=projects/test-tf-sa/serviceAccounts/terraform@test-tf-sa.iam.gserviceaccount.com]
google_service_account.terraform: Destruction complete after 1s
Destroy complete! Resources: 4 destroyed.
Problems
Cannot delete GKE
The impact of the Compute Engine default service account deletion in IAM principals started.
Cannot delete GKE cluster with the error.
Google Compute Engine: Required 'compute.instanceGroups.update' permission for 'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp'.
$ gcloud container clusters delete cluster-1 --zone=us-central1-c
The following clusters will be deleted.
- [cluster-1] in [us-central1-c]
Do you want to continue (Y/n)? Y
Deleting cluster cluster-1...done.
ERROR: (gcloud.container.clusters.delete) Some requests did not succeed:
- args: ['Operation [<Operation\n clusterConditions: [<StatusCondition\n canonicalCode: CanonicalCodeValueValuesEnum(PERMISSION_DENIED, 7)\n message: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for \'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'.">]\n detail: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for \'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'."\n endTime: \'2022-01-14T00:20:54.190004708Z\'\n error: <Status\n code: 7\n details: []\n message: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for \'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'.">\n name: \'operation-1642119632548-20038ec5\'\n nodepoolConditions: []\n operationType: OperationTypeValueValuesEnum(DELETE_CLUSTER, 2)\n selfLink: \'https://container.googleapis.com/v1/projects/1079157603081/zones/us-central1-c/operations/operation-1642119632548-20038ec5\'\n startTime: \'2022-01-14T00:20:32.548792723Z\'\n status: StatusValueValuesEnum(DONE, 3)\n statusMessage: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for \'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'."\n targetLink: \'https://container.googleapis.com/v1/projects/1079157603081/zones/us-central1-c/clusters/cluster-1\'\n zone: \'us-central1-c\'>] finished with error: Google Compute Engine: Required \'compute.instanceGroups.update\' permission for \'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'.']
exit_code: 1
Cannot create GKE
Try to create another GKE cluster.
Cannot create GKE cluster anymore. This is the original issue GCP GKE - Google Compute Engine: Not all instances running in IGM I encountered which lead to this trouble shooting.
cluster-2
Google Compute Engine: Not all instances running in IGM after 18.798524988s. Expected 3, running 0, transitioning 3. Current errors: [PERMISSIONS_ERROR]: Instance 'gke-cluster-2-default-pool-36522bb7-0vkl' creation failed: Required 'compute.instances.create' permission for 'projects/1079157603081/zones/us-central1-c/instances/gke-cluster-2-default-pool-36522bb7-0vkl' (when acting as '1079157603081@cloudservices.gserviceaccount.com'); [PERMISSIONS_ERROR]: Instance 'gke-cluster-2-default-pool-36522bb7-0vkl' creation failed: Required 'compute.disks.create' permission for 'projects/1079157603081/zones/us-central1-c/disks/gke-cluster-2-default-pool-36522bb7-0vkl' (when acting as '1079157603081@cloudservices.gserviceaccount.com'); [PERMISSIONS_ERROR]: Instance 'gke-cluster-2-default-pool-36522bb7-0vkl' creation failed: Required 'compute.disks.setLabels' permission for 'projects/1079157603081/zones/us-central1-c/disks/gke-cluster-2-default-pool-36522bb7-0vkl' (when acting as '1079157603081@cloudservices.gserviceaccount.com'); [PERMISSIONS_ERROR]: Instance 'gke-cluster-2-default-pool-36522bb7-0vkl' creation failed: Required 'compute.subnetworks.use' permission for 'projects/1079157603081/regions/us-central1/subnetworks/default' (when acting as '1079157603081@cloudservices.gserviceaccount.com'); [PERMISSIONS_ERROR]: Instance 'gke-cluster-2-default-pool-36522bb7-0vkl' creation failed: Required 'compute.subnetworks.useExternalIp' permission for 'projects/1079157603081/regions/us-central1/subnetworks/default' (when acting as '1079157603081@cloudservices.gserviceaccount.com') (truncated).
Attempts to fix
Tried these measures but no luck.
Reassign roles/Editor to the service account
GCP_PROJECT_ID=test-tf-sa
GCP_SVC_ACC="serviceAccount:1079157603081-compute@developer.gserviceaccount.com"
gcloud projects add-iam-policy-binding ${GCP_PROJECT_ID} \
--member=serviceAccount:${GCP_SVC_ACC} \
--role=roles/Editor
-----
ERROR: Policy modification failed. For a binding with condition, run "gcloud alpha iam policies lint-condition" to identify issues in condition.
ERROR: (gcloud.projects.add-iam-policy-binding) INVALID_ARGUMENT: Role roles/Editor is not supported for this resource.
Apply undelete service account
$ gcloud beta iam service-accounts undelete 109558708367309276392
restoredAccount:
email: 1079157603081-compute@developer.gserviceaccount.com
etag: MDEwMjE5MjA=
name: projects/test-tf-sa/serviceAccounts/1079157603081-compute@developer.gserviceaccount.com
oauth2ClientId: '109558708367309276392'
projectId: test-tf-sa
uniqueId: '109558708367309276392'
They did not bring the Compute Engine default service account back to IAM principals.
Disable Compute Engine API
Tried to disable the Compute Engine API but as GKE nodes cannot be deleted, it cannot be disabled.
Manually add back the service account
Manually added Compute Engine account 1079157603081-compute@developer.gserviceaccount.com" and added IAM roles/Editor. It is not appear in gcloud projects get-iam-policy
command output, but still cannot delete the GKE cluster.
$ gcloud projects get-iam-policy $GCP_PROJECT_ID
bindings:
...
- members:
- serviceAccount:1079157603081-compute@developer.gserviceaccount.com <-----
- serviceAccount:service-1079157603081@compute-system.iam.gserviceaccount.com
role: roles/editor
...
etag: BwXVf9cVnaU=
version: 1
$ gcloud container clusters delete cluster-1 --zone=us-central1-c
The following clusters will be deleted.
- [cluster-1] in [us-central1-c]
Do you want to continue (Y/n)? Y
Deleting cluster cluster-1...done.
ERROR: (gcloud.container.clusters.delete) Some requests did not succeed:
- args: ['Operation [<Operation\n clusterConditions: [<StatusCondition\n canonicalCode: CanonicalCodeValueValuesEnum(PERMISSION_DENIED, 7)\n
message: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for
\'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'.">]\n
detail: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for
\'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'."\n
endTime: \'2022-01-14T00:33:38.746564953Z\'\n error: <Status\n code: 7\n details: []\n
message: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for
\'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'.">\n
name: \'operation-1642120382096-034b0eb7\'\n nodepoolConditions: []
\n operationType: OperationTypeValueValuesEnum(DELETE_CLUSTER, 2)\n
selfLink: \'https://container.googleapis.com/v1/projects/1079157603081/zones/us-central1-c/operations/operation-1642120382096-034b0eb7\'\n
startTime: \'2022-01-14T00:33:02.096736326Z\'\n status: StatusValueValuesEnum(DONE, 3)\n
statusMessage: "Google Compute Engine: Required \'compute.instanceGroups.update\' permission for
\'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'."\n
targetLink: \'https://container.googleapis.com/v1/projects/1079157603081/zones/us-central1-c/clusters/cluster-1\'\n
zone: \'us-central1-c\'>] finished with error: Google Compute Engine: Required \'compute.instanceGroups.update\' permission for
\'projects/1079157603081/zones/us-central1-c/instanceGroups/gke-cluster-1-default-pool-b54fa6be-grp\'.']
exit_code: 1
Another service account for GKE
Created another service account that has compute.admin roles, and used it to create/delete the GKE cluster(s). However, once the Compute Engine default service account has been compromised, keep having the GCP GKE - Google Compute Engine: Not all instances running in IGM issue.
Goal to achieve
Bring the Compute Engine default service account back into the IAM principals like in the snapshot below, and be able to manage Compute Engines and GKE nodes.