1

I have multiple large csv files in a S3 bucket. I want to write their data to a dynamoDB table. The issue is my function runs for more than 15 minutes and get the timeout error without completely writing the csv filevto DynamoDB. So is there a way to split the csv into smaller parts?

Things I've tried so far

this - This doesn't invoke itself as it is supposed to be(writes a few lines to the table then stops without any errors.
aws document - Gives s3fs module not found error. Tried many many things to make it work but couldn't.

Is there anyway I can do my task?

Thank You

Cee Jay
  • 21
  • 2

2 Answers2

2

I think the fan-out approach from your linked solution should be the best option.

Take a main lambda function which will split the processing by dividing the number of lines (e.g. 1000 Lines each) into fan-out calls for your processing lambda, which will be invoked with Event instead of Tail. The processing lambdas should then only read the CSV lines assigned to it (have a look here).

If you already tried this, could you probably post parts of your solution?

Dharman
  • 30,962
  • 25
  • 85
  • 135
tpschmidt
  • 2,479
  • 2
  • 17
  • 30
1

I could fix my problem (partly) by increasing the writing capacity on dynamodb to 1000 minimum. I could write 1 million records in 10 minutes. Still I needed to split the csv file. Also using batch_write instead of writing each item line by line helps tremendously.

Cee Jay
  • 21
  • 2