DynamoDB point-in-time-recovery files to parquet

Asked Jun 01 '21 at 02:30

Active Jun 01 '21 at 02:30

Viewed 145 times

DynamoDB point in time recovery export option under "Export and streams" seems to be dumping the file in json.gz file format when selected with "DynamoDB JSON" under advanced settings. When I am trying to convert that file (json.gz) to parquet using glue ETL studio. However when we choose input file type as JSON in Glue ETL studio job, it is failing. What is the easiest way to dump DynamoDB data incrementally into parquet format in S3 while taking care of out of memory issues (Lambda/Glue ETL)?

asked Jun 01 '21 at 02:30

androboy

When you say it is failing, what errors are you seeing? What happens if you use the Amazon ION format that DynamoDB can export with the same feature? (ION is a superset of JSON) – NoSQLKnowHow Jun 02 '21 at 17:45
1

I also need more information about what you are doing with Glue ETL studio and how does it compare to what is being done in this blog post for getting to parquet format. https://aws.amazon.com/blogs/database/export-and-analyze-amazon-dynamodb-data-in-an-amazon-s3-data-lake-in-apache-parquet-format/ – NoSQLKnowHow Jun 02 '21 at 18:48

DynamoDB point-in-time-recovery files to parquet

0 Answers0