Elastic Search Query for Distinct Nested Values

Question

I am using the High Level REST Client for Elastic Search 6.2.2. Suppose that I have two documents in index "DOCUMENTS" with type "DOCUMENTS" that are

{
   "_id": 1,
   "Name": "John",
   "FunFacts": {
       "FavColor": "Green",
       "Age": 32
   }
},
{
   "_id": 2,
   "Name": "Amy",
   "FunFacts": {
       "FavFood": "Pizza",
       "Age": 33
   }
}

I want to find out all of the distinct fun facts and their distinct values, ultimately returning an end result of

{
    "FavColor": ["Green"],
    "Age": [32, 33],
    "FavFood": ["Pizza"]
}

It is ok for this to require more than one query to Elastic Search, but I prefer to have only one query. Furthermore, the Elastic Search index may grow to be rather large so I must force as much execution as possible to occur on the ES instance.

This code seems to produce a list of documents containing only FunFacts but I must still perform the aggregation myself, which is very very not desirable.

SearchRequest searchRequest = new SearchRequest("DOCUMENTS");
searchRequest.types("DOCUMENTS");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
String [] includes = new String[1];
includes[0] = "FunFacts";
String [] excludes = new String[1];
excludes[0] = "Name";
searchSourceBuilder.fetchSource(includes, excludes);
searchRequest.source(searchSourceBuilder);

SearchResponse searchResponse =
    restHighLevelClient.search(searchRequest);

Can anyone point me in the right direction? I notice that nearly all of the Elastic Search documentation comes in the form of curl commands, which is not helpful for me as I am not well versed enough to translate such commands to JAVA.

Here is your plot twist. Since users are allowed to decide what will be their fun facts, we cannot know ahead of time what will be the keys inside of the FunFacts Map. :/

Thanks, Matt

falomir · Accepted Answer · 2018-04-18T16:35:41.913

0

You can do it all in one query by using aggregations. Assuming you have the following documents in your index

{
   "Name": "Jake",
   "FunFacts": {
       "FavFood": "Burgers",
       "Age": 32
   }
}

{
   "Name": "Amy",
   "FunFacts": {
       "FavFood": "Pizza",
       "Age": 33
   }
}

{
   "Name": "Alex",
   "FunFacts": {
       "FavFood": "Burgers",
       "Age": 28
   }
}

, and you want to get the distinct "FavFood" choices, you could do so by using the following terms aggregation (docs on this topic)

{
  "aggs": {
    "disticnt_fun_facts": {
      "terms": { "field": "FunFacts.FavFood" }
    }
  }
}

, which would result in something along these lines

{
  ...
  "hits": { ... },
  "aggregations": {
    "disticnt_fun_facts": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "burgers",
          "doc_count": 2
        },
        {
          "key": "pizza",
          "doc_count": 1
        }
      ]
    }
  }
}

For brevity purposes I just left the aggregations part on the resulting response, so the important thing to notice is the buckets array, which represent each of the distinct terms found, key, and they number of occurrences within your documents, doc_count.

Hope that helps.

edited Apr 18 '18 at 16:35

answered Apr 18 '18 at 16:15

falomir

1,159
1
9
16

Thanks @falomir! Is there a way to list all of the keys inside of all of the different `FunFacts` objects? To return, for example, `{"FavFood", "Age", "FavColor"}`. – mrbarret Apr 19 '18 at 13:07
First, I decided to flatten the document. Second, I create my index with a command that includes `"FavFood" : { "type" : "keyword" }` and receive the error message `{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [FavFood] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}],"` . Why am I receiving this message and what can I do to fix this? – mrbarret Apr 19 '18 at 13:44
I found a good answer to the above comment at https://www.elastic.co/blog/strings-are-dead-long-live-strings . – mrbarret Apr 19 '18 at 14:14
In order to be able to sort, and aggregate, on text fields, you need to enable "fielddata" on that field. Fielddata is disabled by default on text fields, and you can read more on the whys here https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html and also how you can modify your existing mapping to enable it. – falomir Apr 20 '18 at 23:54

Elastic Search Query for Distinct Nested Values

1 Answers1

Linked