4

I'm having a lambda function made with NestJS's microservice. It uses a database connection and I'm using a secret service to fetch connection details for it.

Here's my app module:

@Module({
  imports: [
    ConfigModule,
    TypeOrmModule.forRootAsync({
      useClass: SecretsService,
      inject: [],
      imports: [ConfigModule],
    }),
    PropertyModule,
  ],
})
export class AppModule {}

And this is a Secret Service (a part of the ConfigModule):

import { Injectable } from '@nestjs/common';
import { SecretsManager } from 'aws-sdk';
import { GetSecretValueResponse } from 'aws-sdk/clients/secretsmanager';
import { MysqlConnectionOptions } from 'typeorm/driver/mysql/MysqlConnectionOptions';

@Injectable()
export class SecretsService /*  */ {

  private secretsManager: SecretsManager;

  constructor() {
    this.secretsManager = new SecretsManager();
  }

  async createTypeOrmOptions(): Promise<MysqlConnectionOptions> {
    console.log('before getting secret');
    const { SecretString }: GetSecretValueResponse =
      await this.secretsManager.getSecretValue({ SecretId: 'rds/prod' }).promise();
    const secret = JSON.parse(SecretString);
    console.log('after getting a secret', SecretString);

    return {
      /* database config */
    };
  }
}

And it turns out that the code doesn't always get to the “after getting a secret” part. Here are some cases

I change something in the code and deploy a new version of the lambda and it just keeps hanging at the “before getting secret” forever. I wait for 5 minutes and fire that function again, then I wait 10 minutes. Same result.

Then I wait like 20 minutes and the request slips through. After that, I can fire the same function several times in a row and I see “after getting secret” every time.

So it is in fact not fails periodically, but works periodically. Seems like there's some sort of throttling and/or caching, but I don't see it in the code.

Please help me to solve this issue. How can I get my secrets every time I want them?

Konstantin Bodnia
  • 1,372
  • 3
  • 20
  • 43
  • I wasn't able to find any documentation about the throttling of the secrets manager calls. Or any built-in caching system that can affect different deployments. – Konstantin Bodnia Aug 23 '21 at 19:13
  • I made some extra debugging and found out that it fails with `Socket timed out without establishing a connection` error. – Konstantin Bodnia Aug 24 '21 at 06:32

2 Answers2

4

The lambda belonged to three subnets one of which was public and two were private. And in the end, it only worked with one subnet, because the rest were poorly configured by our cluster architect.

Took me ages to dig to the roots of the problem. Check carefully how your network is configured.

Konstantin Bodnia
  • 1,372
  • 3
  • 20
  • 43
  • Related answer on Lambda [Intermittent Connectivity](https://stackoverflow.com/questions/52992085/why-cant-an-aws-lambda-function-inside-a-public-subnet-in-a-vpc-connect-to-the/52994841#52994841) here. – jarmod May 04 '22 at 14:36
2

You should use client-side caching and backoff/retry when accessing Secrets Manager from AWS Lambda.

For more, see Secrets Manager Best Practices.

jarmod
  • 71,565
  • 16
  • 115
  • 122
  • I was of course about to wrap it with cache after I'm done developing. But it says that I “should” use it. It never says that it is a must. It doesn't throw any 429 error, it just hangs forever. – Konstantin Bodnia Aug 23 '21 at 19:02
  • Is this Lambda configured for VPC? If yes, are you using NAT or a VPC Endpoint to route to the Secrets Manager API endpoint. Also, I've seen numerous, perhaps historical, delay issues related to dual stack and the use of IPv6. Just mention that in case it triggers something. Should also mention this [Lambda Extensions technique](https://developer.squareup.com/blog/using-aws-lambda-extensions-to-accelerate-aws-secrets-manager-access/). – jarmod Aug 23 '21 at 19:21
  • It was the issue with subnets' configuration. Took me a lot of digging to find it. – Konstantin Bodnia Aug 24 '21 at 10:17