30

I'm working with The AWS Command Line Interface for DynamoDB.

When we query an item, we get a very detailed JSON output. You get something like this (it has been built from the get-item in order to be almost exhaustive (the NULL type has been omitted) aws command line help:

{
    "Count": 1, 
    "Items": [
        {
            "Id": {
                "S": "app1"
            }, 
            "Parameters": {
                "M": {
                    "nfs": {
                        "M": {
                            "IP" : {
                                "S" : "172.16.0.178"
                            }, 
                            "defaultPath": {
                                "S": "/mnt/ebs/"
                            },
                            "key": {
                                "B": "dGhpcyB0ZXh0IGlzIGJhc2U2NC1lbmNvZGVk"
                            },
                            "activated": {
                                "BOOL": true 
                            }
                        }
                    },
                    "ws" : {
                        "M" : {
                            "number" : {
                                "N" : "5"
                            },
                            "values" : {
                                "L" : [
                                    { "S" : "12253456346346"},
                                    { "S" : "23452353463464"},
                                    { "S" : "23523453461232"},
                                    { "S" : "34645745675675"},
                                    { "S" : "46456745757575"}
                                ]
                            }
                        } 
                    }
                }
            },
            "Oldtypes": {
                "typeSS" : {"SS" : ["foo", "bar", "baz"]},
                "typeNS" : {"NS" : ["0", "1", "2", "3", "4", "5"]},
                "typeBS" : {"BS" : ["VGVybWluYXRvcgo=", "VGVybWluYXRvciAyOiBKdWRnbWVudCBEYXkK", "VGVybWluYXRvciAzOiBSaXNlIG9mIHRoZSBNYWNoaW5lcwo=", "VGVybWluYXRvciA0OiBTYWx2YXRpb24K","VGVybWluYXRvciA1OiBHZW5lc2lzCg=="]}
            }
        }
    ], 
    "ScannedCount": 1, 
    "ConsumedCapacity": null
}

Is there any way to get a simpler output for the Items part? Like this:

{
    "ConsumedCapacity": null,
    "Count": 1,
    "Items": [
        {
            "Id": "app1",
            "Parameters": {
                "nfs": {
                    "IP": "172.16.0.178",
                    "activated": true,
                    "defaultPath": "/mnt/ebs/",
                    "key": "dGhpcyB0ZXh0IGlzIGJhc2U2NC1lbmNvZGVk"
                },
                "ws": {
                    "number": 5,
                    "values": ["12253456346346","23452353463464","23523453461232","34645745675675","46456745757575"]
                }
            },
            "Oldtypes": {
                "typeBS": ["VGVybWluYXRvcgo=", "VGVybWluYXRvciAyOiBKdWRnbWVudCBEYXkK", "VGVybWluYXRvciAzOiBSaXNlIG9mIHRoZSBNYWNoaW5lcwo=", "VGVybWluYXRvciA0OiBTYWx2YXRpb24K", "VGVybWluYXRvciA1OiBHZW5lc2lzCg=="],
                "typeNS": [0, 1, 2, 3, 4, 5],
                "typeSS": ["foo","bar","baz"]
            }
        }
    ],
    "ScannedCount": 1
}

There is nothing helpful in the dynamodb - AWS CLI 1.7.10 documentation.

We must get the result from the command line. I'm willing to use other command line tools like jq if necessary, but such a jq mapping appears to complicated to me.


Update 1: jq based solution (with help from DanielH's answer)

With jq it is easy, but not quite pretty, you can do something like:

$> aws dynamodb query --table-name ConfigCatalog --key-conditions '{ "Id" : {"AttributeValueList": [{"S":"app1"}], "ComparisonOperator": "EQ"}}' | jq -r '.Items[0].Parameters.M."nfs#IP".S'

Result will be: 172.16.0.178

The jq -r option gives you a raw output.


Update 2: jq based solution (with help from @jeff-mercado)

Here is an updated and commented version of Jeff Mercado jq function to unmarshall DynamoDB output. It will give you the expected output:

$> cat unmarshal_dynamodb.jq
def unmarshal_dynamodb:
  # DynamoDB string type
  (objects | .S)

  # DynamoDB blob type
  // (objects | .B)

  # DynamoDB number type
  // (objects | .N | strings | tonumber)

  # DynamoDB boolean type
  // (objects | .BOOL)

  # DynamoDB map type, recursion on each item
  // (objects | .M | objects | with_entries(.value |= unmarshal_dynamodb))

  # DynamoDB list type, recursion on each item
  // (objects | .L | arrays | map(unmarshal_dynamodb))

  # DynamoDB typed list type SS, string set
  // (objects | .SS | arrays | map(unmarshal_dynamodb))

  # DynamoDB typed list type NS, number set
  // (objects | .NS | arrays | map(tonumber))

  # DynamoDB typed list type BS, blob set
  // (objects | .BS | arrays | map(unmarshal_dynamodb))

  # managing others DynamoDB output entries: "Count", "Items", "ScannedCount" and "ConsumedCapcity"
  // (objects | with_entries(.value |= unmarshal_dynamodb))
  // (arrays | map(unmarshal_dynamodb))

  # leaves values
  // .
  ;
unmarshal_dynamodb

If you save the DynamoDB query output to a file, lets say ddb-query-result.json, you can execute to get desired result:

$> jq -f unmarshal_dynamodb.jq ddb-query-result.json
herve
  • 3,825
  • 2
  • 18
  • 27
  • 1
    Hmm, so is it that the object's key names indicate their types? Like "S" is for strings, "M" is for maps, and "N" for numbers? You can actually do something really nice with that. – Jeff Mercado Feb 21 '15 at 02:02
  • 2
    Your `unmarshal_dynamodb.jq` solution is brilliant and credit goes to you & @JeffMercado. One flaw in using the `//` that I've been trying to resolve is that any filter that returns false doesn't get transformed. This matters with boolean values that are actually set to false - they keep the `BOOL` or `B` key. I've added a line to partially resolve this, but still haven't found a way to fully fix it without a second pass: `// (objects | if has("BOOL") or has("B") then [false] else null end)` This adds `false` as a 1-element array and needs to go before the "# managing others..." line. – Dave Stern Apr 25 '15 at 01:17
  • 1
    @DaveStern: I revised the method used here to properly handle falsy values. And should now have an overall cleaner implementation. – Jeff Mercado Oct 11 '17 at 23:59
  • Def worth using @JeffMercado's answer below if you have BOOLs in your schema. – johnboiles Feb 03 '20 at 19:48
  • I don't usually comment, but really brilliant. Thank you! – Paul Fowler Jun 28 '21 at 22:59
  • This is great! The only wrinkle is that it doesn't handle `null` (or rather `"NULL": true`). I was able to fix that by adding this at the beginning: `walk( if type == "object" and has("NULL") then . |= null else . end ) |` – juanes Dec 14 '21 at 00:18

6 Answers6

23

You can decode the values recursively with a well crafted function. It looks like the key names correspond to a type:

S -> string
N -> number
M -> map

Handle each of the cases you want to decode if possible, otherwise filter it out. You can make use of the various type filters and the alternative operator to do so.

$ cat input.json
{
  "Count": 1,
  "Items": [
    {
      "Id": { "S": "app1" },
      "Parameters": {
        "M": {
          "nfs#IP": { "S": "192.17.0.13" },
          "maxCount": { "N": "1" },
          "nfs#defaultPath": { "S": "/mnt/ebs/" }
        }
      }
    }
  ],
  "ScannedCount": 1,
  "ConsumedCapacity": null
}
$ cat ~/.jq
def decode_ddb:
    def _sprop($key): select(keys == [$key])[$key];                 # single property objects only
       ((objects | { value: _sprop("S") })                          # string (from string)
    // (objects | { value: _sprop("NULL") | null })                 # null (from boolean)
    // (objects | { value: _sprop("B") })                           # blob (from string)
    // (objects | { value: _sprop("N") | tonumber })                # number (from string)
    // (objects | { value: _sprop("BOOL") })                        # boolean (from boolean)
    // (objects | { value: _sprop("M") | map_values(decode_ddb) })  # map (from object)
    // (objects | { value: _sprop("L") | map(decode_ddb) })         # list (from encoded array)
    // (objects | { value: _sprop("SS") })                          # string set (from string array)
    // (objects | { value: _sprop("NS") | map(tonumber) })          # number set (from string array)
    // (objects | { value: _sprop("BS") })                          # blob set (from string array)
    // (objects | { value: map_values(decode_ddb) })                # all other non-conforming objects
    // (arrays | { value: map(decode_ddb) })                        # all other non-conforming arrays
    // { value: . }).value                                          # everything else
    ;
$ jq 'decode_ddb' input.json
{
  "Count": 1,
  "Items": [
    {
      "Id": "app1",
      "Parameters": {
        "nfs#IP": "192.17.0.13",
        "maxCount": 1,
        "nfs#defaultPath": "/mnt/ebs/"
      }
    }
  ],
  "ScannedCount": 1,
  "ConsumedCapacity": null
}
Jeff Mercado
  • 129,526
  • 32
  • 251
  • 272
  • Thanks to @jeff-mercado's help. I posted an extended version of his `decode_ddb.jq` function as an update of the original post. – herve Feb 23 '15 at 18:53
2

Another way to achieve the post's goal would be to use a node.js extension like node-dynamodb or dynamodb-marshaler and build a node command line tool.

Interesting tutorial to build a node.js command line application with commander package: Creating Your First Node.js Command-line Application


Here's a quick and dirty oneliner that reads one record from stdin and prints it in simplified form:

node -e 'console.log(JSON.stringify(require("aws-sdk").DynamoDB.Converter.unmarshall(JSON.parse(require("fs").readFileSync(0, "utf-8")))))'
tuomassalo
  • 8,717
  • 6
  • 48
  • 50
herve
  • 3,825
  • 2
  • 18
  • 27
1

Here's an updated version of the jq solution that can handle null values.

$> cat unmarshal_dynamodb.jq
def unmarshal_dynamodb:
  # null
  walk( if type == "object" and .NULL then . |= null else . end ) |

  # DynamoDB string type
  (objects | .S)

  # DynamoDB blob type
  // (objects | .B)

  # DynamoDB number type
  // (objects | .N | strings | tonumber)

  # DynamoDB boolean type
  // (objects | .BOOL)

  # DynamoDB map type, recursion on each item
  // (objects | .M | objects | with_entries(.value |= unmarshal_dynamodb))

  # DynamoDB list type, recursion on each item
  // (objects | .L | arrays | map(unmarshal_dynamodb))

  # DynamoDB typed list type SS, string set
  // (objects | .SS | arrays | map(unmarshal_dynamodb))

  # DynamoDB typed list type NS, number set
  // (objects | .NS | arrays | map(tonumber))

  # DynamoDB typed list type BS, blob set
  // (objects | .BS | arrays | map(unmarshal_dynamodb))

  # managing others DynamoDB output entries: "Count", "Items", "ScannedCount" and "ConsumedCapcity"
  // (objects | with_entries(.value |= unmarshal_dynamodb))
  // (arrays | map(unmarshal_dynamodb))

  # leaves values
  // .
  ;
unmarshal_dynamodb
$> jq -f unmarshal_dynamodb.jq ddb-query-result.json

Credit to @jeff-mercado and @herve for the original version.

juanes
  • 155
  • 1
  • 7
0

As far as I know, there is no other output like the "verbose" one you've posted. Therefore I think, you can't avoid intermediate tools like jq oder sed

There are several proposals in this post for converting the raw dynamo data:

Export data from DynamoDB

Maybe you can adapt one of these scripts in conjunction with jq or sed

Community
  • 1
  • 1
DanielH
  • 172
  • 1
  • 10
  • with `jq` it is easy, but not quiet pretty, you can do something like: `aws dynamodb query --table-name ConfigCatalog --key-conditions '{ "Id" : {"AttributeValueList": [{"S":"app1"}], "ComparisonOperator": "EQ"}}' | jq -r '.Items[0].Parameters.M."nfs#IP".S'` result will be: `172.16.0.178`. See updated post. – herve Feb 20 '15 at 22:13
0

Here is another approach. This may be a little brutal but it shows the basic idea.

def unwanted:    ["B","BOOL","M","S","L","BS","SS"];
def fixpath(p):  [ p[] | select( unwanted[[.]]==[] ) ];
def fixnum(p;v):
    if   p[-2]=="NS" then [p[:-2]+p[-1:],(v|tonumber)]
    elif p[-1]=="N" then [p[:-1], (v|tonumber)]
    else [p,v] end;

reduce (tostream|select(length==2)) as [$p,$v] (
    {}
  ; fixnum(fixpath($p);$v) as [$fp,$fv]      
  | setpath($fp;$fv)
)

Try it online!

Sample Run (assuming filter in filter.jq and data in data.json)

$ jq -M -f filter.jq data.json
{
  "ConsumedCapacity": null,
  "Count": 1,
  "Items": [
    {
      "Id": "app1",
      "Oldtypes": {
        "typeBS": [
          "VGVybWluYXRvcgo=",
          "VGVybWluYXRvciAyOiBKdWRnbWVudCBEYXkK",
          "VGVybWluYXRvciAzOiBSaXNlIG9mIHRoZSBNYWNoaW5lcwo=",
          "VGVybWluYXRvciA0OiBTYWx2YXRpb24K",
          "VGVybWluYXRvciA1OiBHZW5lc2lzCg=="
        ],
        "typeNS": [
          0,
          1,
          2,
          3,
          4,
          5
        ],
        "typeSS": [
          "foo",
          "bar",
          "baz"
        ]
      },
      "Parameters": {
        "nfs": {
          "IP": "172.16.0.178",
          "activated": true,
          "defaultPath": "/mnt/ebs/",
          "key": "dGhpcyB0ZXh0IGlzIGJhc2U2NC1lbmNvZGVk"
        },
        "ws": {
          "number": 5,
          "values": [
            "12253456346346",
            "23452353463464",
            "23523453461232",
            "34645745675675",
            "46456745757575"
          ]
        }
      }
    }
  ],
  "ScannedCount": 1
}
jq170727
  • 13,159
  • 3
  • 46
  • 56
0

Here is a script in node to do this.

I named the file reformat.js but you can call it whatever you want

'use strict';

/**
 * This script will parse the AWS dynamo CLI JSON response into JS.
 * This parses out the type keys in the objects.
 */

const fs = require('fs');

const rawData = fs.readFileSync('response.json'); // Import the raw response from the dynamoDB CLI query
const response = JSON.parse(rawData); // Parse to JS to make it easier to work with.

function shallowFormatData(data){
  // Loop through the object and replace the Type key with the value.
  for(const key in data){
    const innerRawObject = data[key]
    const innerKeys = Object.keys(innerRawObject)
    innerKeys.forEach(innerKey => {
      const innerFormattedObject = innerRawObject[innerKey]
      if(typeof innerFormattedObject == 'object'){
        data[key] = shallowFormatData(innerFormattedObject) // Recursively call formatData if there are nested objects
      }else{
        // Null items come back with a type of "NULL" and value of true. we want to set the value to null if the type is "NULL"
        data[key] = innerKey == 'NULL' ? null : innerFormattedObject
      }
    })
  }
  return data
}

// this only gets the Items and not the meta data.
const result = response.Items.map(item => {
  return shallowFormatData(item)
})

console.dir(result, {'maxArrayLength': null}); // There is a default limit on how big a console.log can be, this removes that limit.

Step 1) run your dynamoDB query via the CLI and save it to a JSON file. To save the response from the CLI just add > somefile.json. For convenience, I saved this in the same directory as my reformat file

// Example: Run in CLI

$ aws dynamodb query --table-name stage_requests-service_FoxEvents \
 --key-condition-expression "PK = :v1" \
 --expression-attribute-values file://expression-attributes.json > response.json

expression-attributes.json

{
  ":v1": {"S": "SOMEVAL"}
}

If you need more information on how I queried DynamoDB look at these examples in the documentation https://docs.aws.amazon.com/cli/latest/reference/dynamodb/query.html#examples

Now that you have a JSON file of the data you need to reformat run the format.js script from your terminal

Step 2)

// Run this in your terminal
$ node reformat.js > formatted.js 

You should have a clean JS Object output if you want a JSON object output just put a JSON.stringify(result) in the console.dir at the end of the script

Mike
  • 96
  • 5