2
  • I need to find the csv files from the folder
  • List all the files inside the folder
  • Convert files to json and save in the same bucket

Csv file, Like below so many csv files are there

emp_id,Name,Company
10,Aka,TCS
11,VeI,TCS

Code is below

import boto3
import pandas as pd
def lambda_handler(event, context):
    s3 = boto3.resource('s3')
    my_bucket = s3.Bucket('testfolder')
    for file in my_bucket.objects.all():
        print(file.key)
    for csv_f in file.key:
        with open(f'{csv_f.replace(".csv", ".json")}', "w") as f:
            pd.read_csv(csv_f).to_json(f, orient='index')

Not able to save if you remove bucket name it will save in the folder. How to save back to bucket name

Marcin
  • 215,873
  • 14
  • 235
  • 294
aysh
  • 493
  • 1
  • 11
  • 23

1 Answers1

1

You can check the following code:

from io import StringIO

import boto3
import pandas as pd

s3 = boto3.resource('s3')

def lambda_handler(event, context):
    
    s3 = boto3.resource('s3')
    
    input_bucket = 'bucket-with-csv-file-44244'
    
    my_bucket = s3.Bucket(input_bucket)
    
    for file in my_bucket.objects.all():
        
        if file.key.endswith(".csv"):
           
            csv_f = f"s3://{input_bucket}/{file.key}"
            
            print(csv_f)
            
            json_file = file.key.replace(".csv", ".json")
            
            print(json_file)
            
            json_buffer = StringIO()
            
            df = pd.read_csv(csv_f)
            
            df.to_json(json_buffer, orient='index')
            
            s3.Object(input_bucket, json_file).put(Body=json_buffer.getvalue())            

Your lambda layer will need to have:

fsspec
pandas
s3fs
Marcin
  • 215,873
  • 14
  • 235
  • 294
  • can i ask fsspec and s3fs. what is the use – aysh Aug 11 '20 at 05:08
  • 1
    @aysh To read from s3. Panda can read directly from s3. Should be also able to write as well, but in my tests now, I didn't write. – Marcin Aug 11 '20 at 05:10
  • One last question why we need to convert stringI0. Sorry If i am disturbing you json_buffer = StringIO() and put(Body=json_buffer.getvalue()) I didnt get the info of using the line – aysh Aug 11 '20 at 05:19
  • @aysh This is a workaround. Normally panda should be able to write to s3. But in my tests it did not. Maybe you will have more luck. The alternative and more traditional way of writing to s3 is [here](https://stackoverflow.com/a/40615630/248823) which involves StringIO. – Marcin Aug 11 '20 at 05:22
  • if i want to give one different bucket name to save df.to_json(json_buffer, orient='index'). can i given parameter here. or do i need to create a function as upload_file() , i need to pass the bucket name – aysh Aug 11 '20 at 08:00
  • 1
    @aysh in `s3.Object(input_bucket, json_file)` you can change `input_bucket` to something else. – Marcin Aug 11 '20 at 08:02
  • can you answer below https://stackoverflow.com/questions/63618932/how-to-read-content-from-the-s3-bucket-as-url – aysh Aug 27 '20 at 15:07