Using jq to parse keys present in two lists (even though it might not exist in one of those)

Question

(It was hard to come up with a title that summarizes the issue, so feel free to improve it).

I have a JSON file with the following content:

{
    "Items": [
        {
            "ID": {
                "S": "ID_Complete"
            }, 
            "oldProperties": {
                "L": [
                    {
                        "S": "[property_A : value_A_old]"
                    }, 
                    {
                        "S": "[property_B : value_B_old]"
                    }
                ]
            },
            "newProperties": {
                "L": [
                    {
                        "S": "[property_A : value_A_new]"
                    }, 
                    {
                        "S": "[property_B : value_B_new]"
                    }
                ]
            }
        }, 
        {
            "ID": {
                "S": "ID_Incomplete"
            }, 
            "oldProperties": {
                "L": [
                    {
                        "S": "[property_B : value_B_old]"
                    }
                ]
            },
            "newProperties": {
                "L": [
                    {
                        "S": "[property_A : value_A_new]"
                    }, 
                    {
                        "S": "[property_B : value_B_new]"
                    }
                ]
            }
        }
    ]
}

I would like to manipulate the data using jq in such a way that for each item in Items[] that has a new value for property_A (under newProperties list) generate an output with the corresponding id, old and new (see desired output below) fields regardless of the value that property has in the oldProperties list. Moreover, if property_A does not exist in the oldProperties, I still need the old field to be populated with a null (or any fixed string for what it's worth).

Desired output:

{
  "id": "id_Complete",
  "old": "[property_A : value_A_old]",
  "new": "[property_A : value_A_new]"
}
{
  "id": "ID_Incomplete",
  "old": null,
  "new": "[property_A : value_A_new]"
}

Note: Even though property_A doesn't exist in the oldProperties list, other properties may (and will) exist.

The problem I am facing is that I am not able to get an output when the desired property does not exist in the oldProperties list. My current jq command looks like this:

jq -r '.Items[] | 
    { id:.ID.S, 
      old:.oldProperties.L[].S | select(. | contains("property_A")),
      new:.newProperties.L[].S | select(. | contains("property_A")) }'

Which renders only the ID_Complete case, while I need the other as well.

Is there any way to achieve this using this tool?

Thanks in advance.

Jeff Mercado · Answer 1 · 2017-10-13T13:53:14.693

Your list of properties appear to be values of some object. You could map them out into an object to then diff the objects, then report on the results.

You could do something like this:

def make_object_from_properties:
      [.L[].S | capture("\\[(?<key>\\w+) : (?<value>\\w+)\\]")]
    | from_entries
    ;
def diff_objects($old; $new):
      def _prop($key): select(has($key))[$key];
      ([($old | keys[]), ($new | keys[])] | unique) as $keys
    | [   $keys[] as $k
        | ({ value: $old | _prop($k) } // { none: true }) as $o
        | ({ value: $new | _prop($k) } // { none: true }) as $n
        | (if   $o.none                 then "add"
          elif  $n.none                 then "remove"
          elif  $o.value != $n.value    then "change"
                                        else "same"
          end) as $s
        | { key: $k, status: $s, old: $o.value, new: $n.value }
      ]
  ;
def diff_properties:
      (.oldProperties | make_object_from_properties) as $old
    | (.newProperties | make_object_from_properties) as $new
    | diff_objects($old; $new) as $diff
    | foreach $diff[] as $d ({ id: .ID.S };
          select($d.status != "same")
        | .old = ((select(any("remove", "change"; . == $d.status)) | "[\($d.key) : \($d.old)]") // null)
        | .new = ((select(any("add", "change";    . == $d.status)) | "[\($d.key) : \($d.new)]") // null)
      )
    ;
[.Items[] | diff_properties]

This yields the following output:

[
  {
    "id": "ID_Complete",
    "old": "[property_A : value_A_old]",
    "new": "[property_A : value_A_new]"
  },
  {
    "id": "ID_Complete",
    "old": "[property_B : value_B_old]",
    "new": "[property_B : value_B_new]"
  },
  {
    "id": "ID_Incomplete",
    "old": null,
    "new": "[property_A : value_A_new]"
  },
  {
    "id": "ID_Incomplete",
    "old": "[property_B : value_B_old]",
    "new": "[property_B : value_B_new]"
  }
]

It seems like your data is in some kind of encoded format too. For a more robust solution, you should consider defining some functions to decode them. Consider approaches found here on how you could do that.

Awesome response, thanks. Both this and the other work. Before accepting one however, I will be trying them out within the system that is already in place and see which of both can be better adapted to it (since the question presented a working and trimmed down example of a small portion of the data). — Nacho, Oct 11 '17 at 14:31
After several tests, I was able to better adjust (with fewer changes) the solution proposed in the other answer to my system. That's why I am accepting it, however I want to thank you for your detailed answer, it was of great value as well to better understand how jq works. — Nacho, Oct 13 '17 at 10:35

score 1 · Accepted Answer · answered Oct 10 '17 at 18:24

This filter produces the desired output.

def parse: capture("(?<key>\\w+)\\s*:\\s*(?<value>\\w+)") ;
def print: "[\(.key) : \(.value)]";
def norm:   [.[][][] | parse | select(.key=="property_A") | print][0];

  .Items
| map({id:.ID.S, old:.oldProperties|norm, new:.newProperties|norm})[]

Sample Run (assumes filter in filter.jq and data in data.json)

$ jq -M -f filter.jq data.json
{
  "id": "ID_Complete",
  "old": "[property_A : value_A_old]",
  "new": "[property_A : value_A_new]"
}
{
  "id": "ID_Incomplete",
  "old": null,
  "new": "[property_A : value_A_new]"
}

Try it online!

Awesome response, thanks. Both this and the other work. Before accepting one however, I will be trying them out within the system that is already in place and see which of both can be better adapted to it (since the question presented a working and trimmed down example of a small portion of the data). — Nacho, Oct 11 '17 at 14:31

Using jq to parse keys present in two lists (even though it might not exist in one of those)

2 Answers2