I'm a noob to AWS and lambda, so I apologize if this is a dumb question. What I would like to be able to do is load a spreadsheet into an s3 bucket, trigger lambda based on that upload, have lambda load the csv into pandas and do stuff with it, then write the dataframe back to a csv into a second s3 bucket.
I've read a lot about zipping a python script and all the libraries and dependencies and uploading that, and thats a separate question. I've also figured out how to trigger lambda upon uploading a file to an S3 bucket and to automatically copy that file to a second s3 bucket.
The part I'm having trouble finding any information on is that middle part, the loading the file into pandas and manipulating the file within pandas all inside the lambda function.
First question: Is something like that even possible? Second question: how do I "grab" the file from the s3 bucket and load it into pandas? would it be something like this?
import pandas as pd
import boto3
import json
s3 = boto3.resource('s3')
def handler(event, context):
dest_bucket = s3.Bucket('my-destination-bucket')
df = pd.read_csv(event['Records'][0]['s3']['object']['key'])
# stuff to do with dataframe goes here
s3.Object(dest_bucket.name, <code for file key>).copy_from(CopySource = df)
? I really have no idea if that's even close to right and is a complete shot in the dark. Any and all help would be really appreciated, because I'm pretty obviously out of my element!