10

I want to serialize a json string to an Elasticsearch SearchResponse object. It works fine if the json string doesn't contains an aggregation.

If the json string contains an aggregation the XContentParser throws an ParsingException[Could not parse aggregation keyed as [target_field] exception.

The code I use to serialize the json string to an Elasticsearch SearchResponse object:

    Settings settings = Settings.builder().build();
    SearchModule searchModule = new SearchModule(settings, false, new ArrayList<>());

    NamedXContentRegistry xContentRegistry = new NamedXContentRegistry(searchModule.getNamedXContents());

    JsonXContentParser xContentParser = new JsonXContentParser(xContentRegistry,
            new JsonFactory().createParser(json));
    SearchResponse response = SearchResponse.fromXContent(xContentParser);

It seems that I have to register aggregations to the NamedXContentRegistry but i don't know how to.

technocrat
  • 3,513
  • 5
  • 25
  • 39
Julian8
  • 121
  • 1
  • 1
  • 6
  • is the string a valid query? You might want to run that through kibana's dev console or try it out via curl. – Jilles van Gurp Apr 12 '18 at 14:18
  • Yes it is. I allready tested it. – Julian8 Apr 12 '18 at 17:00
  • 1
    Did you ever managed to get this working? – ChrisDekker Apr 21 '18 at 20:14
  • 1
    We ended up doing sterms#distinct_values for the "" (see https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html). We found the TYPE of the aggregation wasn't getting set otherwise. See also dadoonet's answer in https://discuss.elastic.co/t/elasticsearch-json-response-to-searchresponse-object/124394/3 – Aaron May 02 '18 at 23:23
  • I'm having the same problem. I'm using ES 6.3 and cannot construct a SearchResponse object from a json string using the example linked to by Aaron. – Owen Oct 19 '18 at 16:34
  • Owen if you still need the answer for your question you can check my answer https://stackoverflow.com/questions/25425243/create-dummy-searchresponse-instance-for-elasticsearch-test-case/54322918#54322918 . Hopefully it will help – Ambuj Jan 23 '19 at 09:29

5 Answers5

18

Background:
I'm writing this answer from my experience of creating a SearchResponse object for the purpose of writing a Java Unit Test. The goal was to take any JSON response object from an Elasticsearch query, marshall that into a SearchResponse object, and Unit Test the business logic of creating a consumable output.

We're using Elasticsearch 6.7, the high-level rest client, and parsing the SearchResponse using Elastic's POJOs (vs just doing a .toString() and manipulating it with GSON or Jackson).

Explanation of the solution:
Elasticsearch's high-level rest client generically parses results from the low-level rest client. The SearchRequest's response JSON is converted into a SearchResponse Object in the RestHighLevelClient on line 129 in the search method. This method calls performRequestAndParseEntity on line 1401, which accepts an entityParser as a CheckedFunction<XContentParser, Resp, IOException>. Finally, we can see that when invoking the entityParser on line 1401, it calls the parseEntity method on line 1714 which determines the XContentType for the entity and ultimately performs the parse. Notably, when the parser is created on line 1726 a registry is passed into the parser. This registry contains all the possible XContent values a response field may be. The registry is created when the RestHighLevelClient is constructed on line 288. The full list of types, including Aggregation types, is listed on line 1748.

Onto the solution:
After reading the Elasticsearch discussion on this, it would appear that if you want to inject a JSON Response from Elastic into the SearchResponse object, it is necessary to create a NamedXContentRegistry and list of XContents testing you have to re-create the parsing. A helper method to do that, sourced from Elastic's discussion:

public static List<NamedXContentRegistry.Entry> getDefaultNamedXContents() {
    Map<String, ContextParser<Object, ? extends Aggregation>> map = new HashMap<>();
    map.put(TopHitsAggregationBuilder.NAME, (p, c) -> ParsedTopHits.fromXContent(p, (String) c));
    map.put(StringTerms.NAME, (p, c) -> ParsedStringTerms.fromXContent(p, (String) c));
    List<NamedXContentRegistry.Entry> entries = map.entrySet().stream()
            .map(entry -> new NamedXContentRegistry.Entry(Aggregation.class, new ParseField(entry.getKey()), entry.getValue()))
            .collect(Collectors.toList());
  return entries;
}

The map in the above code needs to have ALL of the Aggregations that's necessary for your test. There are more than two, two are here for brevity.

Using this helper getNamedXContents() method, you can now use the following method to take a JSON String and inject it into the SearchResponse. Also sourced from Elastic's Discussion:

public static SearchResponse getSearchResponseFromJson(String jsonResponse){
    try {
        NamedXContentRegistry registry = new NamedXContentRegistry(getDefaultNamedXContents());
        XContentParser parser = JsonXContent.jsonXContent.createParser(registry, jsonResponse);
        return SearchResponse.fromXContent(parser);
    } catch (IOException e) {
        System.out.println("exception " + e);
    }catch (Exception e){
        System.out.println("exception " + e);
    }
    return new SearchResponse();
}

Applying the solution with an Aggregation result:
Elasticsearch needs a hint to know what type of aggregation to parse this as. The hint is provided by elastic when adding ?typed_keys to the query. An example is shown in the Elasticsearch documentation on Aggregation Type Hints.

To inject the JSON String into a SearchResponse object, one must (1) Use the methods above and (2) Inject a string with type hints in it.

Primary Sources:

  1. https://discuss.elastic.co/t/elasticsearch-json-response-to-searchresponse-object/124394/6
  2. https://github.com/elastic/elasticsearch/blob/master/client/rest-high-level/src/main/java/org/elasticsearch/client/RestHighLevelClient.java
  3. https://github.com/elastic/elasticsearch/blob/master/test/framework/src/main/java/org/elasticsearch/test/InternalAggregationTestCase.java
  4. https://www.elastic.co/guide/en/elasticsearch/reference/current/returning-aggregation-type.html

Note: There are a lot of articles from circa-2015 that say this is impossible. That is obviously incorrect.

technocrat
  • 3,513
  • 5
  • 25
  • 39
  • 1
    Hi, I tried to do as your suggestion, however, the aggregation now has type as ParsedStringTerms instead of StringTerms. Do you have any idea how can I convert it back to StringTerms? – Minh Khoi Jul 31 '19 at 06:40
  • What version of Elasticsearch are you on? It's listed here: https://github.com/elastic/elasticsearch/blob/master/client/rest-high-level/src/main/java/org/elasticsearch/client/RestHighLevelClient.java#L1776 If you add that to your getDefaultNamedXContents() map, does it work? – technocrat Jul 31 '19 at 11:34
  • 1
    I'm using Elasticsearch 6.4.3. Moreover, the method ParsedStringTerms.fromXContent() return a ParsedStringTerm object instead of a StringTerm object. – Minh Khoi Aug 01 '19 at 07:33
  • 1
    I see the same behavior as @MinhKhoi. A ParsedStringTerms cannot be cast to a StringTerm object. – Matt Hulse Jul 16 '20 at 16:44
2

Based on the answer above, I managed to do it like this:

I wrote an JSON like this:

XContentBuilder builder = XContentFactory.jsonBuilder();
response.toXContent(builder, ToXContent.EMPTY_PARAMS);
String result = Strings.toString(builder);

and then I manged to read it like this:

 try {
     NamedXContentRegistry registry = new NamedXContentRegistry(getDefaultNamedXContents());
     XContentParser parser = JsonXContent.jsonXContent.createParser(registry, DeprecationHandler.THROW_UNSUPPORTED_OPERATION, result);
     SearchResponse searchResponse = SearchResponse.fromXContent(parser);
 } catch (IOException e) {
     System.out.println("exception " + e);
 } catch (Exception e) {
     System.out.println("exception " + e);
 }

public static List<NamedXContentRegistry.Entry> getDefaultNamedXContents() {
    Map<String, ContextParser<Object, ? extends Aggregation>> map = new HashMap<>();
    map.put(TopHitsAggregationBuilder.NAME, (p, c) -> ParsedTopHits.fromXContent(p, (String) c));
    map.put(StringTerms.NAME, (p, c) -> ParsedStringTerms.fromXContent(p, (String) c));
    List<NamedXContentRegistry.Entry> entries = map.entrySet().stream()
            .map(entry -> new NamedXContentRegistry.Entry(Aggregation.class, new ParseField(entry.getKey()), entry.getValue()))
            .collect(Collectors.toList());
    return entries;
}

Hope it works :)

1

you need to add ?typed_keys end of your request URL, such as /cranking/_search?typed_keys, take a look at this reference.

and you'd better add more parse registry in the NamedXContentRegistry just like the framework source code. Following is all of the registry entry:

private List<NamedXContentRegistry.Entry> getProvidedNamedXContents() {
    List<NamedXContentRegistry.Entry> entries = new ArrayList<>();

    for (NamedXContentProvider service : ServiceLoader.load(NamedXContentProvider.class)) {
        entries.addAll(service.getNamedXContentParsers());
    }

    return entries;
}

private NamedXContentRegistry getDefaultNamedXContentRegistry() {
    List<NamedXContentRegistry.Entry> entries = new ArrayList<>();
    entries.addAll(getDefaultNamedXContents());
    entries.addAll(getProvidedNamedXContents());
    return new NamedXContentRegistry(entries);
}


private List<NamedXContentRegistry.Entry> getDefaultNamedXContents() {
    Map<String, ContextParser<Object, ? extends Aggregation>> map = new HashMap<>();
    map.put("cardinality", (p, c) -> ParsedCardinality.fromXContent(p, (String) c));
    map.put("hdr_percentiles", (p, c) -> ParsedHDRPercentiles.fromXContent(p, (String) c));
    map.put("hdr_percentile_ranks", (p, c) -> ParsedHDRPercentileRanks.fromXContent(p, (String) c));
    map.put("tdigest_percentiles", (p, c) -> ParsedTDigestPercentiles.fromXContent(p, (String) c));
    map.put("tdigest_percentile_ranks", (p, c) -> ParsedTDigestPercentileRanks.fromXContent(p, (String) c));
    map.put("percentiles_bucket", (p, c) -> ParsedPercentilesBucket.fromXContent(p, (String) c));
    map.put("min", (p, c) -> ParsedMin.fromXContent(p, (String) c));
    map.put("max", (p, c) -> ParsedMax.fromXContent(p, (String) c));
    map.put("sum", (p, c) -> ParsedSum.fromXContent(p, (String) c));
    map.put("avg", (p, c) -> ParsedAvg.fromXContent(p, (String) c));
    map.put("value_count", (p, c) -> ParsedValueCount.fromXContent(p, (String) c));
    map.put("simple_value", (p, c) -> ParsedSimpleValue.fromXContent(p, (String) c));
    map.put("derivative", (p, c) -> ParsedDerivative.fromXContent(p, (String) c));
    map.put("bucket_metric_value", (p, c) -> ParsedBucketMetricValue.fromXContent(p, (String) c));
    map.put("stats", (p, c) -> ParsedStats.fromXContent(p, (String) c));
    map.put("stats_bucket", (p, c) -> ParsedStatsBucket.fromXContent(p, (String) c));
    map.put("extended_stats", (p, c) -> ParsedExtendedStats.fromXContent(p, (String) c));
    map.put("extended_stats_bucket", (p, c) -> ParsedExtendedStatsBucket.fromXContent(p, (String) c));
    map.put("geo_bounds", (p, c) -> ParsedGeoBounds.fromXContent(p, (String) c));
    map.put("geo_centroid", (p, c) -> ParsedGeoCentroid.fromXContent(p, (String) c));
    map.put("histogram", (p, c) -> ParsedHistogram.fromXContent(p, (String) c));
    map.put("date_histogram", (p, c) -> ParsedDateHistogram.fromXContent(p, (String) c));
    map.put("sterms", (p, c) -> ParsedStringTerms.fromXContent(p, (String) c));
    map.put("lterms", (p, c) -> ParsedLongTerms.fromXContent(p, (String) c));
    map.put("dterms", (p, c) -> ParsedDoubleTerms.fromXContent(p, (String) c));
    map.put("missing", (p, c) -> ParsedMissing.fromXContent(p, (String) c));
    map.put("nested", (p, c) -> ParsedNested.fromXContent(p, (String) c));
    map.put("reverse_nested", (p, c) -> ParsedReverseNested.fromXContent(p, (String) c));
    map.put("global", (p, c) -> ParsedGlobal.fromXContent(p, (String) c));
    map.put("filter", (p, c) -> ParsedFilter.fromXContent(p, (String) c));
    map.put("sampler", (p, c) -> ParsedSampler.fromXContent(p, (String) c));
    map.put("geohash_grid", (p, c) -> ParsedGeoHashGrid.fromXContent(p, (String) c));
    map.put("range", (p, c) -> ParsedRange.fromXContent(p, (String) c));
    map.put("date_range", (p, c) -> ParsedDateRange.fromXContent(p, (String) c));
    map.put("geo_distance", (p, c) -> ParsedGeoDistance.fromXContent(p, (String) c));
    map.put("filters", (p, c) -> ParsedFilters.fromXContent(p, (String) c));
    map.put("adjacency_matrix", (p, c) -> ParsedAdjacencyMatrix.fromXContent(p, (String) c));
    map.put("siglterms", (p, c) -> ParsedSignificantLongTerms.fromXContent(p, (String) c));
    map.put("sigsterms", (p, c) -> ParsedSignificantStringTerms.fromXContent(p, (String) c));
    map.put("scripted_metric", (p, c) -> ParsedScriptedMetric.fromXContent(p, (String) c));
    map.put("ip_range", (p, c) -> ParsedBinaryRange.fromXContent(p, (String) c));
    map.put("top_hits", (p, c) -> ParsedTopHits.fromXContent(p, (String) c));
    map.put("composite", (p, c) -> ParsedComposite.fromXContent(p, (String) c));
    List<NamedXContentRegistry.Entry> entries = map.entrySet().stream()
            .map((entry) -> new NamedXContentRegistry.Entry(Aggregation.class, new ParseField((String) entry.getKey()), entry.getValue()))
            .collect(Collectors.toList());
    entries.add(new NamedXContentRegistry.Entry(Suggest.Suggestion.class, new ParseField("term"), (parser, context) -> TermSuggestion.fromXContent(parser, (String) context)));
    entries.add(new NamedXContentRegistry.Entry(Suggest.Suggestion.class, new ParseField("phrase"), (parser, context) -> PhraseSuggestion.fromXContent(parser, (String) context)));
    entries.add(new NamedXContentRegistry.Entry(Suggest.Suggestion.class, new ParseField("completion"), (parser, context) -> CompletionSuggestion.fromXContent(parser, (String) context)));
    return entries;
}
Ole Pannier
  • 3,208
  • 9
  • 22
  • 33
Damon DG
  • 11
  • 1
0

I ran into the same problem using ElasticSearch 7.15. The answer given by technocrat above, really helped to crack this down but it still didn't work for me as the aggregates were not recognized. The aggregations in my JSON looked like the one below:

{
  ...
  "aggregations": {
    "my-agg-name": {                 
      "buckets": []
    }
  }
}

Like explained in this article the issue was due to the expectation of aggregations to come back in the response as sterms#my-agg-name while the original JSON contained just the name of the aggregation my-agg-name. Using the code above and also adding the relevant aggregation type to the registry did not work.

I found out that a simple solution consists in returning the aggregation type in the response. So as per the example in the official documentation for the aggregation feature, adding typed_key to my aggregation request:

GET /my-index-000001/_search?typed_keys
{
  "aggs": {
    "my-agg-name": {
      "histogram": {
        "field": "my-field",
        "interval": 1000
      }
    }
  }
} 

will return relevant aggregate types in the response as you can see below (histogram#my-agg-name):

{
  ...
  "aggregations": {
    "histogram#my-agg-name": {                 
      "buckets": []
    }
  }
}

Now the parser will recognize the relevant aggregation type and the conversion will be successful. If it doesn't then please make sure that the aggregation type returned in the answer is included in the registry map as per previous answers. This worked for me with ElasticSearch 7.15.

AR1
  • 4,507
  • 4
  • 26
  • 42
0

Let me approach the problem from a different direction. What do you need SearchResponse for? In most of the cases you need it in test to be returned from for example RestHighLevelClient search methods when you want to test a service logic.

If so, then why not to mock the SearchResponse? Let's say you receive a SearchResponse for the following ES query:

GET index_name/_search
{
  "aggs": {
    "AGG_NAME": {
      "terms": {
        "field": "someField"
      }
    }
  }
}

and the response:

{
  "aggregations" : {
    "AGG_NAME": {
      "buckets": [
        {
          "key": "AAA",
          "doc_count": 100
        },
        {
          "key": "BBB",
          "doc_count": 200
        }
      ]
    }
  }
}

Now in the service you convert it into the Map<String, Long> having the bucket count:

    Map<String, Long> searchResponseToMap(SearchResponse searchResponse) {
        var buckets = Optional.ofNullable(searchResponse)
            .map(SearchResponse::getAggregations)
            .map(aggregations -> (ParsedStringTerms) aggregations.get("AGG_NAME"))
            .map(ParsedTerms::getBuckets)
            .orElse(List.of());

        return buckets.stream().collect(Collectors.toMap(Bucket::getKeyAsString, Bucket::getDocCount));
    }

In test instead of creating a SearchResponse as suggested in the other answers you could mock it like this:

@Test
void shouldReturnAggregations() {
    // AAA/BBB buckets
    Terms.Bucket aaa = mock(Terms.Bucket.class);
    ParsedStringTerms.ParsedBucket bbb = mock(ParsedStringTerms.ParsedBucket.class);
    when(aaa.getKeyAsString()).thenReturn("AAA");
    when(aaa.getDocCount()).thenReturn(100L);
    when(bbb.getKeyAsString()).thenReturn("BBB");
    when(bbb.getDocCount()).thenReturn(200L);
    
    // AGG_NAME
    ParsedStringTerms aggName = mock(ParsedStringTerms.class);
    when(aggName.getBuckets()).thenAnswer(invocation -> List.of(aaa, bbb));
    
    Aggregations aggregations = mock(Aggregations.class);
    when(aggregations.get("AGG_NAME")).thenReturn(aggName);
    
    // searchResponse
    SearchResponse searchResponse = mock(SearchResponse.class);
    when(searchResponse.getAggregations()).thenReturn(aggregations);
    
    when(esRepository.search(any())).thenReturn(searchResponse);
    
    // When
    Map<String, Long> result = service.fetchAndTransform();
    
    // Then
    assertThat(result).containsEntry("AAA", 100L)
        .containsEntry("BBB", 200L);
}

jwpol
  • 1,188
  • 10
  • 22