0

I am trying to export DynamoDB table as JSON format to S3 and from there import it to BigQuery. The important part is exporting DynamoDB table as JSON format to S3, because the table I am working on is not a small table. This table contains 5.6 million records and about 15.000 (on a quiet day) new records are inserted every day. I came across with a blog post which suggests Lambda (ref: http://randomwits.com/blog/export-dynamodb-s3) function but table.scan() function does not work well with large tables.

So how can I export DynamoDB table in JSON format to S3 and from there import it to BigQuery efficiently? I saw some options like HEVO, Glue, etc. But I don't know which way would be the most efficient.

  • 1
    DynamoDB has a new feature "export to S3" which offers a good solution but in that case I will have to enable Point-in-time recovery (PITR) for Amazon DynamoDB. I am unsure if it will worth it, if the solution will be effective. – PurpleGreen Nov 18 '20 at 09:43
  • I would enable PITR, personally. Related: https://stackoverflow.com/questions/18896329/export-data-from-dynamodb and https://stackoverflow.com/questions/33357821/how-to-export-a-dynamodb-table-as-a-csv-through-aws-cli-without-using-pipeline. – jarmod Nov 18 '20 at 12:10
  • @M.EceErcan Please go through the below link. https://aws.amazon.com/blogs/aws/new-export-amazon-dynamodb-table-data-to-data-lake-amazon-s3/ – Sarathy Velmurugan Feb 17 '21 at 06:25

1 Answers1

0

You can do this with AWS lambda, lambda is triggered by DynamoDB stream, then this lambda will write to cloud logging, from cloud logging you will have to create a sink and make big query as the destination

Levi
  • 64
  • 7
  • That will help export *new* data, but not the existing data. – jarmod Nov 18 '20 at 12:08
  • Then you can utilize dynamodb export to s3, then query the data using athena, the query results can be put on a new bucket -> AWS Lambda -> Cloud Logging -> Sink to BQ https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DataExport.Using.html – Levi Nov 18 '20 at 13:00
  • Right, I'm just pointing out that your answer addresses the change data capture, but not the original data. – jarmod Nov 18 '20 at 13:20