0

I am trying to setup a fairly simple pipeline (I think) where I can upload a csv in an S3 bucket and a Lambda function will be triggered on upload which will take the file, read it with Pandas and upload some of the data in a MySQL table running in RDS.

I have been following these two tutorials to learn how to do that:

I have created the Lambda function to run inside the same VPC (the default VPC) that RDS is running on. The problem is that testing the function with the same test used in the first tutorial I am getting a "Task timed out after 3.00 seconds" error. I found this SO post and this one that suggests to create a VPC endpoint and I followed the link provided by Mark B to create the endpoint. However, testing the function gives again a task time out error. I've even followed this tutorial and created a bucket policy to allow access from the VPC endpoint but to no avail.

For the lambda function I have created an execution role with the following policies:

  • AWSLambdaVPCAccessExecutionRole
  • AmazonS3FullAccess
  • the policy from the first tutorial
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:PutLogEvents",
                "logs:CreateLogGroup",
                "logs:CreateLogStream"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::*/*"
        }
    ]
}

and the code I use for the function is

import json
import urllib.parse
import boto3

print('Loading function')

s3 = boto3.client('s3')


def lambda_handler(event, context):
    print("Received event: " + json.dumps(event, indent=2))

    # Get the object from the event and show its content type
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
    
    try:
        response = s3.get_object(Bucket=bucket, Key=key)
        print("CONTENT TYPE: " + response['ContentType'])
        print("Response: ", response)
        
        return response['ContentType']
    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e
              

The test I use for the function is the same as this one: https://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html#with-s3-example-test-dummy-event

In the Lambda function the resource-based policy statements looks like this enter image description here

I would really appreciate any suggestions how to move on from here

Dr.Fykos
  • 90
  • 1
  • 8
  • 1
    Is your lambda timing out when trying to read the file ? Does the lambda IAM role have access to the specified S3 bucket ? What is the size of the file in S3, maybe the file is too large for lambda to process ? (I am guessing this might not be the case since timeout is 3 sec) – wolf Jul 31 '23 at 17:46
  • I updated my question with the policies I have attached to the execution role of the Lambda function. I don't think the problem is the file itself. I have done the first tutorial initially with the same file I am using now and the function worked before. – Dr.Fykos Jul 31 '23 at 18:17
  • Does your lambda security group allow egress to S3 (443) and MySQL? And your RDS allow ingress from the lambda? – nick18702 Jul 31 '23 at 18:46
  • Share the relevant lambda code – Paolo Jul 31 '23 at 18:57
  • 1
    Everyone's shooting in the dark here until you explain what API request (or other network request) the Lambda function is making that is actually causing the timeout. – jarmod Jul 31 '23 at 19:58
  • If the timeout is occurring during a call to S3, then it suggests that the VPC Endpoint is not working. Try running a simple Lambda function that calls `ListBuckets` to verify that it can communicate with S3. It is likely that the VPC Endpoint Policy is not configured correctly. See: [Control access to VPC endpoints using endpoint policies - Amazon Virtual Private Cloud](https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-access.html#vpc-endpoint-policies) – John Rotenstein Jul 31 '23 at 23:53
  • Thank you all for your input. I have updated the question with lambda code and the test, hope this is useful. I am not sure about @nick18702 comment but I will investigate. – Dr.Fykos Aug 01 '23 at 15:12
  • You can check that your Lambda function has the correct permissions quickly by deploying it outside of your VPC (i.e. don't select a subnet when configuring), then trigger it to execute, then check the CloudWatch Logs. I'm guessing that will work correctly and that the problem is your S3 endpoint config, so you should add that to your post. – jarmod Aug 01 '23 at 16:21

1 Answers1

0

There are a number of things to consider here. Assuming S3 configuration is correct,

  1. Is 3 seconds actually enough? The lambda is doing a lot, reading from S3, updating to RDS. If it is a big file, it might not be enough and time out. Maybe increase to 1 minute and re-test.
  2. You don't mention about RDS permissions? Have you added permissions for Lambda to write in RDS? For example, is there a security group that allows writing to database?
  3. Also, even though RDS and Lambda are in same VPC, do check if the subnets have access to connect. For example, the subnet where Lambda is running, does it VPC endpoint or internet configured in routing?
skzi
  • 340
  • 3
  • 14