How to read a csv file from s3 bucket using lambda function and boto3?

Question

I have s3 bucket and I have set a lambda function which will display contents of csv file when a csv file is uploaded to s3 bucket.S3 bucket is already set as a trigger for my lambda function. Can you please suggest and advise?

Where is the Lambda function supposed to "display contents of csv file" exactly? — Mark B, Jan 25 '20 at 18:03
What are you exactly asking here? You already asked how to read JSON, wouldn't this be the same way? https://stackoverflow.com/questions/59817196/how-to-read-a-json-file-present-in-s3-bucket-using-boto3 — James Z, Jan 25 '20 at 19:46
Is there any way to read a csv file from s3 bucket and then edit it by adding extra column and put that file back to s3 bucket using same lambda function.? — Champion, Jan 26 '20 at 15:15

John Rotenstein · Accepted Answer · 2020-01-27T10:39:18.800

0

An AWS Lambda function is code that you write. You can make it do anything you wish.

For your first scenario of displaying a CSV file in CloudWatch Logs, the Lambda function should:

Retrieve the name of the bucket and object from the event passed to the Lambda function
Download the file from Amazon S3 to the /tmp/ directory
Read the CSV using normal Python code and print() the information that you wish to appear in CloudWatch Logs
Delete the temporary file, so as not to consume too much disk space (there is a limit of 500MB of temporary disk space, and Lambda containers can be reused multiple times)

For your second question of "adding an extra column", the Lambda function should:

Retrieve the name of the bucket and object from the event passed to the Lambda function
Download the file from Amazon S3 to the /tmp/ directory
Manipulate the contents of the file however you wish, using Python code
Upload the file to Amazon S3
Delete the temporary file, so as not to consume too much disk space (there is a limit of 500MB of temporary disk space, and Lambda containers can be reused multiple times)

The code would look something like:

import urllib
import boto3

# Connect to S3 and DynamoDB
s3_client = boto3.client('s3')

def lambda_handler(event, context):

    # Get the bucket and object key from the Event
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
    localFilename = '/tmp/file.txt'

    # Download the file from S3 to the local filesystem
    s3_client.download_file(bucket, key, localFilename)

    # Do stuff here with the local file (your code here!)
    pass

    # Upload modified file
    s3_client.upload_file(localFilename, bucket, key)

edited Jan 27 '20 at 10:39

answered Jan 27 '20 at 00:25

John Rotenstein

241,921
22
380
470

I know the steps but I want to include the add column in below code: – Champion Jan 27 '20 at 02:58
import json import boto3 import csv def lambda_handler(event, context): s3=boto3.client('s3') if event: file_obj=event['Records'][0] filename=str((file_obj)['s3']['object']['key']) print(filename) fileobj=s3.get_object(Bucket="danvan",Key=filename) file_content=fileobj['Body'].read().decode('utf-8') print(file_content) print(type(file_content)) – Champion Jan 27 '20 at 02:59
The above code will read csv or any file as and when it is uploaded to s3 bucket using lambda function and display its contents,but my question is I want to add an extra column to my csv file using same lambda function and then save it back to s3? – Champion Jan 27 '20 at 03:01
My recommendation is to download the file, then modify it, then upload the file. For the middle part, you can write your own Python to modify the file however you wish. That part has nothing to do with Lambda, boto3 or AWS. – John Rotenstein Jan 27 '20 at 03:22
Is it possible to share a code on modifying csv file using lambda and send it back to s3 – Champion Jan 27 '20 at 07:23
Actually I am facing issue while modifying a csv file stored in S3 because there is no write attribute with body – Champion Jan 27 '20 at 07:45
I have added some sample code showing how to download and upload the file. This is easier than using `body` and `streams`. You can add your own code in the middle that reads and modifies the local file. – John Rotenstein Jan 27 '20 at 10:40
But I want to automate the process.Downloading the file to a local directory and then making changes and then upload will increase manual work.What I want here is that as I read it using get_object,then I should be able to modify it using boto3 and then send it back to S3 so that it will work for all those files that hit my S3 bucket. – Champion Jan 28 '20 at 02:03
That is exactly what I am recommending. However, rather than using `get_object()`, my recommendation is to have the Lambda function download the file to local file storage within Lambda, then Lambda can perform the edit using code that you write, then upload the modified file. You will find this easier than manipulating the file as a stream. – John Rotenstein Jan 28 '20 at 02:19
But there is no local storage within lambda.Can you show it via code.I mean how you will modify the file. – Champion Jan 28 '20 at 04:13
AWS Lambda functions are given 500MB of storage in `/tmp/`. My code example above shows how to download the file from Amazon S3 that triggered the Lambda function and how to upload it again after it has been modified. The part that "modifies the file" has nothing to do with AWS or Lambda -- it is just normal code that you can write to do whatever you wish. You should start by writing it as a program on your own computer, then move that code into the Lambda function. – John Rotenstein Jan 28 '20 at 04:46
1

Thanks John,it worked.Can you help me with how can I remove the file from lambda local storage – Champion Jan 31 '20 at 04:09
If you always save to the same local filename, then there is no need to remove the file, since it will be overwritten during the next execution. However, if you wish to delete it, you can use `os.remove(localFilename)` (remember to `import os` first). – John Rotenstein Jan 31 '20 at 04:22

How to read a csv file from s3 bucket using lambda function and boto3?

1 Answers1