Use JQ to select specific, arbitrarily nested objects from JSON

Question

I'm looking for efficient means to search through an large JSON object for "sub-objects" that match a filter (via select(), I imagine). However, the top-level JSON is an object with arbitrary nesting contained within, including more simple values, objects and arrays of objects. For example:

{
  "name": "foo",
  "class": "system",
  "description": "top-level-thing",
  "configuration": {
    "status": "normal",
    "uuid": "id"
  },
  "children": [
    {
      "id": "c1",
      "class": "c1",
      "children": [
        {
          "id": "c1.1",
          "class": "c1.1"
        },
        {
          "id": "c1.1",
          "class": "FINDME"
        }
      ]
    },
    {
      "id": "c2",
      "class": "FINDME"
    }
  ],
  "thing": {
    "id": "c3",
    "class": "FINDME"
  }
}

I have a solution which does part of what I want (and is understandable):

jq -r '.. | arrays | .[] | select(.class=="FINDME"?) | .id'

which returns:

c2
c1.1

... however, it misses c3, plus it changes the order of items output. Additionally I'm expecting this to operate on potentially very large JSON structures, I would like to make sure I find an efficient solution. Bonus points for something that remains readable by jq neophytes (myself included).

FWIW, references I was using to help me on the way, in case they help others:

score 8 · Answer 1 · answered Dec 19 '17 at 17:07

For small to modest-sized JSON input, you're on the right track with .. but it seems you want to select objects, like so:

.. | objects | select(.class=="FINDME"?) | .id

For JSON documents that are very large, this might require too much memory, so it may be worth knowing about jq's streaming parser. Unfortunately it's much more difficult to use, so I'd suggest trying the above, and if you're interested, look in the usual places for documentation about the --stream option.

Well, that should have been obvious - I'll use this option until I run into performance issues. — crimson-egret, Dec 19 '17 at 17:57

score 3 · Accepted Answer · edited Dec 20 '17 at 06:07

3

Here's a streaming-parser solution. To make sense of it, you'll need to read up on the --stream option, but the key is that the output includes lines of the form: [PATH, VALUE]

program.jq

foreach inputs as $in (null;
  if has("id") and has("class") then null
  else . as $x
  | $in
  | if length != 2 then null
    elif .[0][-1] == "id" then ($x + {id: .[-1]})
    elif .[0][-1] == "class"
         and .[-1] == "FINDME" then  ($x + {class: .[-1]})
    else $x
    end
  end;
  select(has("id") and has("class")) | .id )

Invocation

jq -n --stream -f program.jq input.json

Output with sample input

"c1.1"
"c2"
"c3"

edited Dec 20 '17 at 06:07

Inian

80,270
14
142
161

answered Dec 19 '17 at 17:43

peak

105,803
17
152
177

While less readable than the other answer you gave, it does what I want,. including retaining the order, and I'll learn something from it's use. Thanks. – crimson-egret Dec 19 '17 at 18:01
Please note the update to remove the assumption. How about posting some details about your file size and comparative timings? – peak Dec 19 '17 at 18:13
Thanks for that assumption removing update. Since the output will likely be slightly different than my example, that bit is helpful. As for timing, I don't have real data sets yet, so I can't provide that. I might generate some simulated data sets, and if I do, I'll post a comparison then. – crimson-egret Dec 19 '17 at 23:01

Use JQ to select specific, arbitrarily nested objects from JSON

2 Answers2

program.jq

Invocation

Output with sample input