0

I have a bunch of hundred thousand line json files, and I'm trying to work how out they are structured.

I'd like to print the path to all keys named "ENTITY" with a value "TEXT".

these can be nested at any level. There are lots of examples for finding one at a particular level, e.g. Select objects based on value of variable in object using jq

But I'm actually trying to figure out where these items are nested, since the file is so large, I can't do it by inspection.

curiousity
  • 79
  • 7

2 Answers2

2
paths( objects | .ENTITY == "TEXT" )

Format the output as desired. For example,

jq -r 'paths( objects | .ENTITY == "TEXT" ) | join(".")'

jqplay


[The following consists of my original answer]

path( .. | select( type == "object" and .ENTITY == "TEXT" ) )

Format the output as desired. For example,

jq -r 'path( .. | select( type =="object" and .ENTITY == "TEXT" ) ) | join(".")'

jqplay

ikegami
  • 367,544
  • 15
  • 269
  • 518
-1

If I understood your question correctly, you are searching for leafs with a given key/field whose value matches a given value. This approach uses leaf_paths to get all leafs along with getpath to get their values, transpose to tuple them up, and finally select to reduce the list to those matching the criteria. The output is only the path arrays.

jq --arg key "ENTITY" --arg value "TEXT" '
  [[leaf_paths],[getpath(leaf_paths)]]
  | transpose
  | map(select(.[0][-1] == $key and .[1] == $value))[][0]
'
pmf
  • 24,478
  • 2
  • 22
  • 31
  • This answer also works. For some reason, ikegami's answer runs about 5 times as fast. – curiousity Oct 12 '21 at 19:05
  • If I were to guess, it's because my approach filters early, whereas this approach constructs a huge number of arrays that will end up being discarded. – ikegami Oct 12 '21 at 19:11
  • Tip: "`leaf_paths` is *deprecated* and will be removed in the next major release." Use `paths(scalars)` instead. – ikegami Oct 12 '21 at 19:32
  • 1
    I think the approach you have in mind is closer to: `leaf_paths as $p | [$p, getpath($p)] | select(.[0][-1] == $key and .[1] == $value)` – peak Oct 13 '21 at 03:05
  • That said, the above can be optimized to `paths(. == $value) | select(.[-1] == $key)`. I still prefer mine (`paths( objects | .ENTITY == "TEXT" )`), though. 1) It gives the path to the object rather than ending all paths with `ENTITY`. 2) It's cleaner. 3) It should also be faster by filtering out more sooner. – ikegami Oct 13 '21 at 14:54