24

In a very large nested json structure I'm trying to find all of the paths that end in a key.

ex:

{
  "A": {
    "A1": {
      "foo": {
        "_": "_"
      }
    },
    "A2": {
      "_": "_"
    }
  },
  "B": {
    "B1": {}
  },
  "foo": {
    "_": "_"
  }
}

would print something along the lines of: ["A","A1","foo"], ["foo"]


Unfortunately I don't know at what level of nesting the keys will appear, so I haven't been able to figure it out with a simple select. I've gotten close with jq '[paths] | .[] | select(contains(["foo"]))', but the output contains all the permutations of any tree that contains foo. output: ["A", "A1", "foo"]["A", "A1", "foo", "_"]["foo"][ "foo", "_"]

Bonus points if I could keep the original data structure format but simply filter out all paths that don't contain the key (in this case the sub trees under "foo" wouldn't need to be hidden).

peak
  • 105,803
  • 17
  • 152
  • 177
chrisst
  • 1,696
  • 5
  • 19
  • 32

2 Answers2

41

With your input:

$ jq -c 'paths | select(.[-1] == "foo")' 
["A","A1","foo"]
["foo"]

Bonus points:

(1) If your jq has tostream:

$ jq 'fromstream(tostream| select(.[0]|index("foo")))'

Or better yet, since your input is large, you can use the streaming parser (jq -n --stream) with this filter:

fromstream( inputs|select( (.[0]|index("foo"))))

(2) Whether or not your jq has tostream:

. as $in
| reduce (paths(scalars) | select(index("foo"))) as $p
    (null; setpath($p; $in|getpath($p)))

In all three cases, the output is:

{
  "A": {
    "A1": {
      "foo": {
        "_": "_"
      }
    }
  },
  "foo": {
    "_": "_"
  }
}
peak
  • 105,803
  • 17
  • 152
  • 177
  • this is so awesome! I took your first version and put it into a bash function in with my aliases: `json_path() { cat $2 | jq -c "paths | select(.[-1] == \"$1\")";}` Thank you! – themaninthewoods Sep 15 '21 at 21:12
  • 2
    @themaninthewoods - Please note that shell-based string interpolation is at best fragile. Consider using `jq --arg s "$1" ......` instead. – peak Sep 15 '21 at 22:42
  • `jq -c 'paths | select(.[-1] == "foo")' ` helped a lot. Can I print the value for each of those paths, too? I didn't find a simple solution, and `paths` seems to forget the value. – towi Jan 20 '23 at 09:07
  • Use `getpath/1`. – peak Jan 20 '23 at 13:09
0

I had the same fundamental problem.

With (yaml) input like:

developer:
  android:
    members:
    - alice
    - bob
    oncall:
    - bob
hr:
  members:
  - charlie
  - doug
this:
  is:
    really:
      deep:
        nesting:
          members:
          - example deep nesting

I wanted to find all arbitrarily nested groups and get their members.

Using this:

yq . | # convert yaml to json using python-yq
    jq ' 
    . as $input | # Save the input for later
    . | paths | # Get the list of paths 
        select(.[-1] | tostring | test("^(members|oncall|priv)$"; "ix")) | # Only find paths which end with members, oncall, and priv
        . as $path | # save each path in the $path variable
    ( $input | getpath($path) ) as $members | # Get the value of each path from the original input
    {
        "key": ( $path | join("-") ), # The key is the join of all path keys
        "value": $members  # The value is the list of members
    }
    ' |
    jq -s 'from_entries' | # collect kv pairs into a full object using slurp
    yq --sort-keys -y . # Convert back to yaml using python-yq

I get output like this:

developer-android-members:
  - alice
  - bob
developer-android-oncall:
  - bob
hr-members:
  - charlie
  - doug
this-is-really-deep-nesting-members:
  - example deep nesting
Oliver I
  • 436
  • 1
  • 3
  • 12