2

Some pods in my GKE Autopilot cluster aren't able to grab the Application Default Credentials to call other GCP services.

I will apply a new deployment, and 1 or 2 out of the 3 pods won't be able to authenticate using the googleapis (google-auth-library) npm package (tried with version v73.0.0 and the latest v84.0.0).

I get:

Error: Could not load the default credentials. Browse to https://cloud.google.com/docs/authentication/getting-started for more information. at GoogleAuth.getApplicationDefaultAsync (/node_modules/google-auth-library/build/src/auth/googleauth.js:173:19)

I am using this code and retrying on failure:

       const {google} = require('googleapis');

       const setGoogleAuth = async () => {
            try {
                const auth = new google.auth.GoogleAuth({
                    // Scopes can be specified either as an array or as a single, space-delimited string.
                    scopes: ['https://www.googleapis.com/auth/cloud-platform'],
                });             
                
                // Acquire an auth client, and bind it to all future calls
                const authClient = await auth.getClient();
                google.options({auth: authClient});
            } catch (e) {
                console.error(e)
                
                //retry

                //sleep for 3 seconds
                await sleep(3000)
                

                await setGoogleAuth()
            }
            
        }

Calling the metadata server manually via curl http://metadata/computeMetadata/v1/instance/service-accounts/default/identity?audience=<my-gcp-endpoint> returns a valid token from the pod failing to authenticate with the googleapis package

Sometimes killing the pod and having them recreated works (using Horizontal Pod Autoscaler). Other times, I have no problems with the deployment. At times, killing the pods so they recreate doesn't help at all. The behaviour seems very non-deterministic.

Any help would be appreciated, thank you!

  • Try this [way](https://stackoverflow.com/questions/64723213/passing-gcp-service-account-key-to-gke-pods/64724609#64724609) or even better [workload identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity). – dany L Aug 19 '21 at 23:04
  • 1
    The cluster is using workload identity, at times the pods work with no issues – Mike Gindin Aug 20 '21 at 13:48
  • Maybe this can give you some ideas to test. https://github.com/googleapis/google-auth-library-nodejs/issues/798 where they discuss playing with timeout settings – dany L Aug 20 '21 at 14:07

1 Answers1

1

Setting DETECT_GCP_RETRIES=3 or K_SERVICE=true in the environment worked.

See full GitHub issue discussion here: https://github.com/googleapis/google-auth-library-nodejs/issues/1236

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-ask). – Community Sep 12 '21 at 04:13