Export data from DynamoDB

Question

Is it possible to export data from DynamoDB table in some format?

The concrete use case is that I want to export data from my production dynamodb database and import that data into my local dynamodb instance so my application can work with local copy of data instead of production data.

I use the standard DynamoDB-local as a local instance of DynamoDB.

Ivailo Bardarov · Answer 1 · 2020-01-26T11:16:13.477

66

This will export all items as jsons documents

aws dynamodb scan --table-name TABLE_NAME > export.json

This script will read from remote dynamodb table and import into the local the full table.

TABLE=YOURTABLE
maxItems=25
index=0

DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems)
((index+=1)) 
echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000


nextToken=$(echo $DATA | jq '.NextToken')
while [[ "${nextToken}" != "" ]]
do
  DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems --starting-token $nextToken)
  ((index+=1))
  echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
  aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000
  nextToken=$(echo $DATA | jq '.NextToken')
done

Here are a version of the script using files to keep the exported data on disk.

TABLE=YOURTABLE
maxItems=25
index=0
DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems)
((index+=1))
echo $DATA | cat > "$TABLE-$index.json"

nextToken=$(echo $DATA | jq '.NextToken')
while [[ "${nextToken}" != "" ]]
do
  DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems --starting-token $nextToken)
  ((index+=1))
  echo $DATA | cat > "$TABLE-$index.json"
  nextToken=$(echo $DATA | jq '.NextToken')
done

for x in `ls *$TABLE*.json`; do
  cat $x | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
  aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000
done

edited Jan 26 '20 at 11:16

answered Mar 15 '19 at 12:25

Ivailo Bardarov

3,775
1
28
25

You can create items one at a time in AWS Console with this format if you select the "Text" option and check "DynamoDB Json." – Aposhian Jan 23 '20 at 15:22
Warning: If DynamoDB contains escaped characters (e.g. newline) this may not work in some shells – Woodz Jan 19 '21 at 06:18
@Woodz I have tested on bash, in which shell you hit an error? – Ivailo Bardarov Jan 19 '21 at 17:51
1

@IvailoBardarov unfortunately the behaviour of bash (and bash builtin tools like echo - see https://stackoverflow.com/questions/8467424/echo-newline-in-bash-prints-literal-n#comment10474481_8467424) is not consistent across OSs such that on Intel MacOS 11.1, with a DynamoDB record that contains an escaped newline (`\n`), this script replaces the escaped newline with an actual newline in the JSON file - thereby breaking the `jq` call with: `parse error: Invalid string: control characters from U+0000 through U+001F must be escaped at line 2, column 23`. I solved with `sed` to double-escape `\n` – Woodz Jan 20 '21 at 02:38
if someone uses AWS SQL WORKBENCH it's important to set batch-write-item commands with region=localhost, and also set environment variables for the correct credentials - otherwise you won't see the imported data... another option is to start local dynamo with -SharedDb flag - which spares all this configuration – Yitzchak Nov 15 '22 at 11:18
@Woodz does replacing ```echo $DATA``` with ```printf "%s" $DATA``` helps? – Ivailo Bardarov Nov 24 '22 at 16:46

score 20 · Answer 2 · edited Feb 13 '17 at 14:46

20

There is a tool named DynamoDBtoCSV

that can be used for export all the data to a CSV file. However, for the other way around you will have to build your own tool. My suggestion is that you add this functionality to the tool, and contribuite it to the Git repository.

Another way is use AWS Data Pipeline for this task (you will save all the costs of reading the data from outside AWS infraestructure). The approach is similar:

Build the pipeline for output
Download the file.
Parse it with a custom reader.

edited Feb 13 '17 at 14:46

Stéphane Bruckert

21,706
14
92
130

answered Sep 20 '13 at 11:33

Dani C.

910
7
16

1

Interesting solution, would be interested to see that "custom reader", there is lots of noise du the data types – Vadorequest Oct 23 '18 at 23:32
You can also look at this template to create and activate the pipeline. https://usecases.totalcloud.io/use-cases/aws-dynamodb-to-s3-exporter https://usecases.totalcloud.io/use-cases/activate-dynamodb-export-pipeline – Veer Abheek Singh Manhas Jun 25 '19 at 09:44
I wanted to suggest an edit to this post with an example of running it but it was rejected by the mods, thanks mods. Anyway here is a way to run the DynamoDbToCsv script. `sudo npm install -g dynamodbexportcsv; DynamoDbExportCsv --table "yourtable" --awsregion "us-east-1" --columns "col1,col2,col3" --scans 8 --gzip; cat *.gz > combined.csv.gz` – Colin D Dec 01 '20 at 20:12

Valy Dia · Answer 3 · 2019-05-16T14:41:31.377

Here is a way to export some datas (oftentime we just want to get a sample of our prod data locally) from a table using aws cli and jq. Let's assume we have a prod table called unsurprisingly my-prod-table and a local table called my-local-table

To export the data run the following:

aws dynamodb scan --table-name my-prod-table \
| jq '{"my-local-table": [.Items[] | {PutRequest: {Item: .}}]}' > data.json

Basically what happens is that we scan our prod table, transform the output of the scan to shape into the format of the batchWriteItem and dump the result into a file.

To import the data in your local table run:

aws dynamodb batch-write-item \
--request-items file://data.json \
--endpoint-url http://localhost:8000

Note: There are some restriction with the batch-write-item request - The BatchWriteItem operation can contain up to 25 individual PutItem and DeleteItem requests and can write up to 16 MB of data. (The maximum size of an individual item is 400 KB.).

how would this line should look in windows? – Žilvinas Sep 03 '20 at 15:28 — Žilvinas, Sep 03 '20 at 15:28

score 13 · Answer 4 · answered May 16 '14 at 23:28

13

Export it from the DynamoDB interface to S3.

Then convert it to Json using sed:

sed -e 's/$/}/' -e $'s/\x02/,"/g' -e $'s/\x03/":/g' -e 's/^/{"/' <exported_table> > <exported_table>.json

Source

answered May 16 '14 at 23:28

nostromo

1,435
2
17
23

Jorge Tovar · Answer 5 · 2023-02-13T21:28:57.927

5

I extend Valy dia solution to allow all the process of exporting with only aws-cli | jq

aws dynamodb scan --max-items 25 --table-name <TABLE_NAME> \
| jq '{"<TABLE_NAME>": [.Items[] | {PutRequest: {Item: .}}]}' > data.json

aws dynamodb describe-table --table-name <TABLE_NAME> > describe.json | jq ' .Table | {"TableName": .TableName, "KeySchema": .KeySchema, "AttributeDefinitions": .AttributeDefinitions,  "ProvisionedThroughput": {
      "ReadCapacityUnits": 5,
      "WriteCapacityUnits": 5
}}' > table-definition.json

aws dynamodb create-table --cli-input-json file://table-definition.json  --endpoint-url http://localhost:8000 --region us-east-1

aws dynamodb batch-write-item --request-items file://data.json --endpoint-url http://localhost:8000

aws dynamodb scan --table-name <TABLE_NAME> --endpoint-url http://localhost:8000

edited Feb 13 '23 at 21:28

answered May 29 '20 at 21:48

Jorge Tovar

1,374
12
17

Windows Version aws dynamodb describe-table --no-verify-ssl --table-name | jq ".Table | {\"TableName\": .TableName, \"KeySchema\": .KeySchema, \"AttributeDefinitions\": .AttributeDefinitions, \"ProvisionedThroughput\": { \"ReadCapacityUnits\": 5, \"WriteCapacityUnits\": 5}, \"GlobalSecondaryIndexes\": {\"IndexName\": .IndexName, \"KeySchema\": .KeySchema, \"Projection\": .Projection, \"ProvisionedThroughput\": { \"ReadCapacityUnits\": 5, \"WriteCapacityUnits\": 5 }}}"
– Sólon Soares Jun 28 '21 at 15:54
in order for second line to work you must omit the " > describe.json" part – Yitzchak Nov 15 '22 at 09:14

score 3 · Answer 6 · answered Apr 06 '20 at 18:50

I think my answer is more similar to Ivailo Bardarov , if planning to run this from linux instance run this

1.Login to your AWS account and go to IAM to create a user with limited policy for a role(for security purpose of course). This should be only limited to read dynamodb table that you would like to backup.

2.Copy the access key and secret and update below command to run it on Linux (but make sure your table is not huge and possibly creating a space issue for the box you are running this on)

AWS_ACCESS_KEY_ID='put_your_key' AWS_SECRET_ACCESS_KEY='put_your_secret' aws --region='put_your_region' dynamodb scan --table-name 'your_table_name'>> export_$(date "+%F-%T").json

Note similar command can be executed on Windows/Powershell I have not tested so I'm not adding it here.

I tested on Windows, here's a sample: aws dynamodb scan --table-name=TableA > export.json --region=us-east-1 — influent, May 06 '20 at 22:34

score 2 · Answer 7 · answered Sep 21 '13 at 18:34

2

Try my simple node.js script dynamo-archive. It exports and imports in JSON format.

answered Sep 21 '13 at 18:34

yegor256

102,010
123
446
597

1

It's pretty much broken - it can export but it cannot import, it would just crash – Denis Mysenko Aug 29 '17 at 05:21
@DenisMysenko please, submit a ticket, we will fix – yegor256 Aug 29 '17 at 09:02

score 2 · Answer 8 · answered Jul 27 '15 at 07:07

I found the best current tool for simple import/exports (including round-tripping through DynamoDB Local) is this Python script:

https://github.com/bchew/dynamodump

This script supports schema export/import as well as data import/export. It also uses the batch APIs for efficient operations.

I have used it successfully to take data from a DynamoDB table to DynamoDB local for development purposes and it worked pretty well for my needs.

score 2 · Answer 9 · answered Apr 20 '20 at 12:15

Expanding on @Ivailo Bardarov's answer I wrote the following script duplicate tables that are in a remote DynamoDB to a local one:

#!/bin/bash
declare -a arr=("table1" "table2" "table3" "table4")
for i in "${arr[@]}"
do
    TABLE=$i
    maxItems=25
    index=0
    echo "Getting table description of $TABLE from remote database..."
    aws dynamodb describe-table --table-name $TABLE > table-description.json
    echo
    echo "Creating table $TABLE in the local database..."
    ATTRIBUTE_DEFINITIONS=$(jq .Table.AttributeDefinitions table-description.json)
    KEY_SCHEMA=$(jq .Table.KeySchema table-description.json)
    BILLING_MODE=$(jq .Table.BillingModeSummary.BillingMode table-description.json)
    READ_CAPACITY_UNITS=$(jq .Table.ProvisionedThroughput.ReadCapacityUnits table-description.json)
    WRITE_CAPACITY_UNITS=$(jq .Table.ProvisionedThroughput.WriteCapacityUnits table-description.json)
    TABLE_DEFINITION=""

    if [[ "$READ_CAPACITY_UNITS" > 0 && "$WRITE_CAPACITY_UNITS" > 0 ]]
    then
    TABLE_DEFINITION="{\"AttributeDefinitions\":$ATTRIBUTE_DEFINITIONS,\"TableName\":\"$TABLE\",\"KeySchema\":$KEY_SCHEMA,\"ProvisionedThroughput\":{\"ReadCapacityUnits\":$READ_CAPACITY_UNITS,\"WriteCapacityUnits\":$WRITE_CAPACITY_UNITS}}"
    else
    TABLE_DEFINITION="{\"AttributeDefinitions\":$ATTRIBUTE_DEFINITIONS,\"TableName\":\"$TABLE\",\"KeySchema\":$KEY_SCHEMA,\"BillingMode\":$BILLING_MODE}"
    fi

    echo $TABLE_DEFINITION > create-table.json
    aws dynamodb create-table --cli-input-json file://create-table.json --endpoint-url http://localhost:8000
    echo "Querying table $TABLE from remote..."
    DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems)
    ((index+=1))
    echo "Saving remote table [$TABLE] contents to inserts.json file..."
    echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.json
    echo "Inserting rows to $TABLE in local database..."
    aws dynamodb batch-write-item --request-items file://inserts.json --endpoint-url http://localhost:8000

    nextToken=$(echo $DATA | jq '.NextToken')        
    while [[ "$nextToken" != "" && "$nextToken" != "null" ]]
    do
      echo "Querying table $TABLE from remote..."
      DATA=$(aws dynamodb scan --table-name $TABLE --max-items $maxItems --starting-token $nextToken)
      ((index+=1))
      echo "Saving remote table [$TABLE] contents to inserts.json file..."
      echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.json
      echo "Inserting rows to $TABLE in local database..."
      aws dynamodb batch-write-item --request-items file://inserts.json --endpoint-url http://localhost:8000
      nextToken=$(echo "$DATA" | jq '.NextToken')
    done
done

echo "Deleting temporary files..."
rm -f table-description.json
rm -f create-table.json
rm -f inserts.json

echo "Database sync complete!"

This script loops over the string array and for each table name it first gets the description of the table and builds a create JSON file with the minimum required parameters and creates the table. Then it uses rest of the @Ivailo Bardarov's logic to generate inserts and pushes them to the created table. Finally it cleans up the generated JSON files.

Keep in mind, my purpose was to just create a rough duplicate (hence the minimum required parameters) of tables for development purposes.

score 1 · Answer 10 · answered Jan 06 '16 at 16:48

1

For those of you that would rather do this using java, there is DynamodbToCSV4j.

JSONObject config = new JSONObject();
config.put("accessKeyId","REPLACE");
config.put("secretAccessKey","REPLACE");
config.put("region","eu-west-1");
config.put("tableName","testtable");
d2csv d = new d2csv(config);

answered Jan 06 '16 at 16:48

quodlibet

452
3
8

Worked like a charm. However, pagination on last evaluated key has to be implemented in this library. – ZZzzZZzz Sep 08 '17 at 20:54

score 1 · Answer 11 · answered Nov 04 '16 at 18:25

1

I have created a utility class to help developers with export. This can be used if you don't want to use data-pipeline feature of AWS. Link to git hub repo is -here

answered Nov 04 '16 at 18:25

SarveshKaushal

81
1
7

score 1 · Answer 12 · answered May 04 '22 at 14:34

Export the dynamoDb data to a json file in your local using AWS CLI. Below is the example with the filters:

aws dynamodb scan --table-name activities --filter-expression "Flag = :val" --expression-attribute-values "{\":val\": {\"S\": \"F\"}}" --select "SPECIFIC_ATTRIBUTES" --projection-expression "Status" > activitiesRecords.json

Yitzchak · Answer 13 · 2022-11-15T12:43:36.467

Based on @IvailoBardarov's answer combined with @JorgeTovar's answer:

A shell script that fetches from remote DynamoDB and import to local DynamoDB

Extra features:

create the local table!
support 'AWS NoSql Workbench'

# REMOTE - SOURCE OF DATA
TABLE=<TABLE_NAME>
PROFILE=<AWS_PROFILE>
REGION=us-east-1

# LOCAL - TARGET OF DATA
export AWS_ACCESS_KEY_ID=d7ac2v # Align the credentials with those that appears in NoSql Workbench
export AWS_SECRET_ACCESS_KEY=4d0x6t

maxItems=25
index=0

# Create Local Table
aws dynamodb --profile $PROFILE --region $REGION describe-table --table-name $TABLE | jq ' .Table | {"TableName": .TableName, "KeySchema": .KeySchema, "AttributeDefinitions": .AttributeDefinitions,  "GlobalSecondaryIndexes": .GlobalSecondaryIndexes | map({"IndexName": .IndexName, "KeySchema": .KeySchema, "Projection": .Projection, "ProvisionedThroughput": {"ReadCapacityUnits": 5, "WriteCapacityUnits": 5}}), "ProvisionedThroughput": {"ReadCapacityUnits": 5, "WriteCapacityUnits": 5}}' > table-definition.json
aws dynamodb create-table --cli-input-json file://table-definition.json  --endpoint-url http://localhost:8000 --region localhost

# Import data 
DATA=$(aws dynamodb --profile $PROFILE --region $REGION scan --table-name $TABLE --max-items $maxItems)
((index+=1)) 
echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000 --region localhost


nextToken=$(echo $DATA | jq '.NextToken')
while [[ "${nextToken}" != "" ]]
do
  DATA=$(aws dynamodb scan --profile $PROFILE --region $REGION --table-name $TABLE --max-items $maxItems --starting-token $nextToken)
  ((index+=1))
  echo $DATA | jq ".Items | {\"$TABLE\": [{\"PutRequest\": { \"Item\": .[]}}]}" > inserts.jsons
  aws dynamodb batch-write-item --request-items file://inserts.jsons --endpoint-url http://localhost:8000 --region localhost
  nextToken=$(echo $DATA | jq '.NextToken')
done

score 0 · Answer 14 · answered Nov 20 '20 at 21:31

0

DynamoDB now has a native Export to S3 feature (in JSON and Amazon Ion formats) https://aws.amazon.com/blogs/aws/new-export-amazon-dynamodb-table-data-to-data-lake-amazon-s3/

answered Nov 20 '20 at 21:31

shiladitya

2,290
1
23
36

score 0 · Answer 15 · answered Jan 08 '21 at 08:30

You can try this code locally. But first the following command should be executed npm init -y && npm install aws-sdk

const AWS = require('aws-sdk');
AWS.config.update({region:'eu-central-1'}); 
const fs = require('fs');
const TABLE_NAME = "YOURTABLENAME"

const docClient = new AWS.DynamoDB.DocumentClient({
    "sslEnabled": false,
    "paramValidation": false,
    "convertResponseTypes": false,
    "convertEmptyValues": true
});

async function exportDB(){
    let params = {
        TableName: TABLE_NAME
    };
    let result = [];
    let items;
    do  {
        items =  await docClient.scan(params).promise();
        items.Items.forEach((item) => result.push(item));
        params.ExclusiveStartKey  = items.LastEvaluatedKey;
    }   while(typeof items.LastEvaluatedKey != "undefined");

    await fs.writeFileSync("exported_data.json", JSON.stringify(result,null, 4)); 
    console.info("Available count size:", result.length);
}
exportDB();

And run node index.js

I hope it works for you

score 0 · Answer 16 · answered Jul 27 '23 at 08:19

0

You an use dynoport https://www.npmjs.com/package/dynoport it will do it for you at a very high performant way exports and imports

answered Jul 27 '23 at 08:19

priyanshu kumar

76
1
6

score -1 · Answer 17 · answered Mar 17 '14 at 22:14

-1

Dynamo DB now provides a way to export and import data to/from S3 http://aws.amazon.com/about-aws/whats-new/2014/03/06/announcing-dynamodb-cross-region-export-import/

answered Mar 17 '14 at 22:14

Manish

37
1
3

1

This is an old blog post that looks to have been replaced by AWS Data Pipeline. There is no import/export tab in DynamoDB anymore – cameck Aug 04 '17 at 21:53

score -1 · Answer 18 · answered Aug 04 '17 at 21:17

In a similar use-case, I have used DynamoDB Streams to trigger AWS Lambda which basically wrote to my DW instance. You could probably write your Lambda to write each of the table changes to a table in your non-production account. This way your Devo table would remain quite close to Prod as well.

score -1 · Answer 19 · answered Feb 27 '20 at 12:14

-1

I used the awesome cyberchef site... https://gchq.github.io/CyberChef

With the csv to json tool.

answered Feb 27 '20 at 12:14

mlo55

6,663
6
33
26

score -3 · Answer 20 · answered Apr 11 '21 at 18:41

-3

For really big datasets, running a continuous (and parallel) scan might be time consuming and fragile process (imagine it dying in the middle). Fortunately, AWS recently added an ability to export your DynamoDB table data straight to S3. This is probably the easiest way to achieve what you wanted because it does not require you to write any code and run any task/script because it's fully managed.

After it's done, you can download it from S3 and import to the local DynamoDB instance using logic like foreach record in file: documentClient.putItem or use some other tooling.

answered Apr 11 '21 at 18:41

Rafal Wiliński

2,240
1
21
26

1

This is a reference to a 3rd party tool, not an AWS native solution. – Valer Nov 03 '22 at 03:21
1

This is a reference to a 3rd party tool, not an AWS native solution. – Valer Nov 03 '22 at 03:22

score -4 · Answer 21 · answered May 25 '16 at 04:26

-4

In DynamoDB web console select your table, than Actions -> Export/Import

answered May 25 '16 at 04:26

Nulldevice

3,926
3
31
37

1

This will only export the records you have selected. – ingh.am Jan 10 '17 at 12:17
4

This technique also limits you to exporting a maximum of 100 records. This is fine for small data sets, but for large data sets it is recommended that you use the pipeline to export to S3. – Jim Daehn Jul 03 '17 at 14:06
This method has severe limitations. – Arry Feb 20 '19 at 08:35
bad method, it doesnt really export all records - only selected which equals max to records one page. – Vladyslav Didenko Sep 29 '19 at 12:34

Export data from DynamoDB

21 Answers21

A shell script that fetches from remote DynamoDB and import to local DynamoDB

Linked