JMESpath expression to filter object by property and return list of object names having this property set

Question

Is it possible to write JMESPath expression to return a list of object names where a specific subproperty value is set? In the example below I'd like to get a list of all hostsnames where fileexists.stat.exists is set to true.

My goal is to use Ansible hostvars structure to get a list of all hosts where a specific file is present.

{
"hostvars": {
    "oclab1n01.example.org": {
        "fileexists": {
            "changed": false, 
            "failed": false, 
            "stat": {
                "exists": false
            }
        }
    }, 
    "oclab1n02.example.org": {
        "fileexists": {
            "changed": false, 
            "failed": false, 
            "stat": {
                "exists": true
            }
        }
    }, 
    "oclab1n03.example.org": {
        "fileexists": {
            "changed": false, 
            "failed": false, 
            "stat": {
                "exists": true
            }
        }
    }
} }

In this example I'd like to get the following output

["oclab1n02.example.org", "oclab1n03.example.org"]

Looks like a duplicate of https://stackoverflow.com/q/41579581 — myrdd, Jun 13 '18 at 23:42

dreftymac · Answer 1 · 2021-06-11T12:18:36.173

Short answer (TL;DR)

Yes, this is possible, but it is extremely cumbersome, because, at least in terms of working with JMESpath, the source dataset is poorly normalized for this kind of general-purpose query.

Context

jmespath query language
querying object properties for deeply nested objects

Problem

How to construct a jmespath query with filter expressions
The goal is to filter on objects with arbitrarily nested object properties

Solution

This can be done with jmespath, but the operation will be cumbersome
One problematic issue: the source dataset is poorly normalized for this kind of jmespath query
In order to construct the jmespath query, we have to assume all the primary object keys are known in advance of creating the query
In this specific example, we have to know that there are three and only three hostnames in advance of constructing the jmespath query ... this is not a favorable circumstance if we want the flexibility to specify any arbitrary number of hostnames

Example

The following (way-too-huge) jmespath query ...

  [
    {
      "hostname": `oclab1n01.example.org`
      ,"fileexists_stat_exists":  @.hostvars."oclab1n01.example.org".fileexists.stat.exists
    }
    ,{
      "hostname": `oclab1n02.example.org`
      ,"fileexists_stat_exists":  @.hostvars."oclab1n02.example.org".fileexists.stat.exists
    }
    ,{
      "hostname": `oclab1n03.example.org`
      ,"fileexists_stat_exists":  @.hostvars."oclab1n02.example.org".fileexists.stat.exists
    }
  ]|[? @.fileexists_stat_exists == `true`]|[*].hostname

returns the following desired result

  [
    "oclab1n02.example.org",
    "oclab1n03.example.org"
  ]

Pitfalls

One major pitfall with this use-case is the source dataset is poorly normalized for this kind of query
A more flattened data structure would be easier to query
Consequently, if possible, a better approach would be to flatten the source dataset before running jmespath queries against it

Alternate example with a different original dataset

If the original data were organized as a list of objects, instead of a set of nested objects within objects, it would be easier to search, sort and filter the list without having to know in advance how many hostname entries are involved.

{"hostvars": [
    {"hostname":"oclab1n01.example.org"
      ,"fileexists":        true
      ,"filechanged":       false
      ,"filefailed":        false
      ,"filestat_exists":   false
      ,"we_can_even_still_deeply_nest":{"however":
           {"im_only_doing":"it here","to":"prove a point"}
         }
     }
    ,{"hostname":"oclab1n02.example.org"
      ,"fileexists":        true
      ,"filechanged":       false
      ,"filefailed":        false
      ,"filestat_exists":   true
     }
    ,{"hostname":"oclab1n03.example.org"
      ,"fileexists":        true
      ,"filechanged":       false
      ,"filefailed":        false
      ,"filestat_exists":   true
     }
  ]
}

The above re-normalized dataset can now be easily queried

hostvars|[? @.filestat_exists == `true`]|[*].hostname

Why do you think that a "key -> record" relationship is poorly normalized? It seems very idiomatic in configuration management software such as salt and ansible, yet I do find that most jinja filters and, as I've learned today JMESPath work great for `[val1, val2, val3] | map(f) -> [f(val1), f(val2), f(val3)]`, but no so much for `{key1: val1, key2: val2, key3: val3} | map(f) -> {key1: f(val1), key2: f(val2), key3: f(val3)}`. Is this somehow fundamentally hard to implement? — LLlAMnYP, Jun 11 '21 at 11:56
@LLlAMnYP **//Why [...] a "key -> record" relationship is poorly normalized//** Only for this specific context, not in general. As you mentioned, it is a routinely-encountered data pattern. The matter here is avoiding to fight against JMespath, vs working with it. **//Is this somehow fundamentally hard to implement//** Not fundamentally. The issue is whether the designer(s) chose to make mapping data type iterable, just like list type. It is just a design decision. Feel free to think similar to relational databases, where "rows" (list) are easier to iterate than columns "mapping". — dreftymac, Jun 11 '21 at 12:15