4

After connecting to the requisite AWS resources at the beginning of my lambda execution function, I have a lambda_handler function that looks like the following:

def lambda_handler(event, context, dst):

    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
    print('Bucket: %s' % bucket)
    print('Object key: %s' % key)

    crm_file_name = key.split('/')[-1]
    crm_query = make_crm_db_query(crm_file_name)
    cur = conn.cursor()
    status = cur.execute(crm_query)

    if status == 1:
        details = cur.fetchone()
        opportunity_id = details[0]

        tmp = dst.get_key('%s/%s' % (opportunity_id, crm_file_name))
        print('starting API request...')
        s = requests.Session()
        r = s.post('http://link/to/endpoint/',\
            files={'pdf': tmp}, data={'opportunity_id': opportunity_id})
        print(r)
        print(r.content)
    else:
        print('not the right file type')

In my development environment this returns the following, indicating that the post was successful:

starting API request...
<Response [201]>
{"opportunity_id":253,"pdf":"https://s3.storage.asset.com:443/253/253___PDF.pdf?Signature=[CONFIDENTIAL STUFF HERE ;)]"}

In AWS Cloud Watch logs, however the process hang when attempting to execute the post request. Here is a log sample:

starting API request...
END RequestId: beedb0c4-ce07-11e6-a715-53b3bd8edccc
REPORT RequestId: beedb0c4-ce07-11e6-a715-53b3bd8edccc  Duration: 30002.89 ms   Billed Duration: 30000 ms Memory Size: 128 MB   Max Memory Used: 22 MB  
2016-12-29T20:46:24.356Z beedb0c4-ce07-11e6-a715-53b3bd8edccc Task timed out after 30.00 seconds 

The S3 bucket, API endpoint, and RDS all belong to the same VPC. The process works in dev but hangs in production. Any pointers on how to debug this?

I checked this post that says that connections to external internet resources require an NAT gateway, but our API endpoint is running on an EC2 instance within the same VPC. Does AWS think that we are still trying to establish an external connection because we are working with API calls? How do I debug this?

Cœur
  • 37,241
  • 25
  • 195
  • 267
aaron
  • 6,339
  • 12
  • 54
  • 80
  • There's not really enough to go off of here from an outsiders perspective. I would just double check the differences between your dev and prod setup...and ensure any assumptions you've made are correct. – Jack Dec 29 '16 at 21:29
  • 1
    You need to make sure you are using the private IP of the EC2 server the API is running on. If you are using the public IP it will be treated as a resources that exists outside the VPC. Also, did you open up the security group that the EC2 server belongs to, to allow access from the Lambda function (via the ID of the security group the Lambda function belongs to)? – Mark B Dec 29 '16 at 21:47
  • @MarkB: This is really helpful, thank you, I was worried that it might be something like that. The problem is that we are running the application on EC2 instance configured with Jenkins. The load balances reads the header of incoming requests and then invokes the appropriate application based on the public domain name. If we switch to the private IP then the load balancer won't know what application to invoke. Sounds like we can either customize the load balancer rules (not sure how easy that it) or create a dedicate instance. What do you think? – aaron Dec 29 '16 at 22:08
  • 2
    That sounds correct. The other options are to add a NAT gateway to your VPC, or perhaps setup a Private Hosted Zone in Route53 that resolves those DNS names to the private IP internally. – Mark B Dec 29 '16 at 22:46
  • @MarkB: Awesome, thank for helping me out with that. If you want to write this up as a quick answer I'll be happy to select it. – aaron Dec 30 '16 at 15:55

1 Answers1

1

I encountered same timeout problem, the reason is below.

AWS document:

When you add VPC configuration to a Lambda function, it can only access resources in that VPC. If a Lambda function needs to access both VPC resources and the public Internet, the VPC needs to have a Network Address Translation (NAT) instance inside the VPC.

Mark B's comment is right.

I advice you can follow this blog to build NAT.

Community
  • 1
  • 1
Jim
  • 1,550
  • 3
  • 20
  • 34
  • Thanks, @Jimlin, you and Mark B are spot on. The VPC was the easiest way to solve this issue. – aaron Feb 27 '17 at 04:32