How to read csv file from s3 bucket in AWS Lambda?

Question

I am trying to read the content of a csv file which was uploaded on an s3 bucket. To do so, I get the bucket name and the file key from the event that triggered the lambda function and read it line by line. Here is my code:

import json
import os
import boto3
import csv

def lambda_handler(event,  context):
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        file_key = record['s3']['object']['key']
        s3 = boto3.client('s3')
        csvfile = s3.get_object(Bucket=bucket, Key=file_key)
        csvcontent = csvfile['Body'].read().split(b'\n')
        data = []
        with open(csvfile['Body'], 'r') as csv_file:
          csv_file = csv.DictReader(csv_file)
          data = list(csv_file)

The exact error I’m getting on the CloudWatch is:

[ERROR] TypeError: expected str, bytes or os.PathLike object, not list
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 19, in lambda_handler
    with open(csvcontent, 'r') as csv_file:

Could someone help me fix this? I appreciate any help you can provide as I am new to lambda

`csvcontent` already contains your data. No need to open a file. `csvcontent` is actually a list of strings (lines) that you can parse. — balderman, Jul 02 '19 at 09:31

score 23 · Answer 1 · answered Apr 10 '20 at 11:27

23

To get the CSV file data from s3 bucket in the proper and with easy to retrieve index format below code helped me a lot:

key = 'key-name'
bucket = 'bucket-name'
s3_resource = boto3.resource('s3')
s3_object = s3_resource.Object(bucket, key)

data = s3_object.get()['Body'].read().decode('utf-8').splitlines()

lines = csv.reader(data)
headers = next(lines)
print('headers: %s' %(headers))
for line in lines:
    #print complete line
    print(line)
    #print index wise
    print(line[0], line[1])

answered Apr 10 '20 at 11:27

nikita91000

241
2
4

do you know how to then get that csv file in lambda as the data frame? – Kalenji Nov 01 '20 at 21:01
Note if you are still getting weird characters using utf-8, try utf-8-sig as it reads the byte order mark as info instead of a string. See https://stackoverflow.com/questions/57152985/what-is-the-difference-between-utf-8-and-utf-8-sig – deesolie Jul 02 '21 at 03:34

score 7 · Accepted Answer · edited Jul 06 '19 at 05:44

7

csvfile = s3.get_object(Bucket=bucket, Key=file_key)
csvcontent = csvfile['Body'].read().split(b'\n')

Here you have already retrieved the file contents and split it into lines. I'm not sure why you're trying to open something again, you can just pass csvcontent into your reader:

csv_data = csv.DictReader(csvcontent)

edited Jul 06 '19 at 05:44

marc_s

732,580
175
1,330
1,459

answered Jul 02 '19 at 09:31

tzaman

46,925
11
90
115

1

For small csvs yes. For large files this code will eat up all memory and get stuck – sheetal Nov 18 '20 at 15:51
1

is s3 in your case boto3.client("s3") or boto3.resource("s3") ? – Janzaib M Baloch Sep 12 '21 at 18:41

score 1 · Answer 3 · answered Jul 02 '19 at 09:39

1

csvfile['Body'] type is StreamingBody, so you can't uses the open xx with.

this code had read all data from the stream.

csvcontent = csvfile['Body'].read().split(b'\n')

so jsut parse the line to get more usefully content.

answered Jul 02 '19 at 09:39

youDaily

1,372
13
21

How to read csv file from s3 bucket in AWS Lambda?

3 Answers3