0

Am querying ElasticSearch using Java API and am getting lot of duplicate values. I want to get only the unique values from the query (distinct value). How can we get the distinct values from the Query Builder.

Please find my java code below, which is giving duplicate values.

QueryBuilder qb2=null;
List<Integer> link_id_array=new ArrayList<Integer>();
for(Replacement link_id:linkIDList) {
    link_id_array.add(link_id.getLink_id());
}

qb2 = QueryBuilders.boolQuery()
        .must(QueryBuilders.termsQuery("id", link_id_array));

Am using elastic search 6.2.3 version with RestHighLevelClient

Polynomial Proton
  • 5,020
  • 20
  • 37
Karthikeyan
  • 1,927
  • 6
  • 44
  • 109

1 Answers1

1

Way 1: You need to use the so-called aggregation API :

Sample query to get distinct emails client :

{
  "query" : {
    "match_all" : { }
  },
  "aggregations" : {
    "label_agg" : {
      "terms" : {
        "field" : "Email_client",
        "size" : 100
      }
    }
  }
}

Java code sample=>

SearchRequestBuilder aggregationQuery = 
      client.prepareSearch("emails")
        .setQuery(QueryBuilders.matchAllQuery())
        .addAggregation(AggregationBuilders.terms("label_agg")
          .field("Email_client").size(100));

SearchResponse response = aggregationQuery.execute().get();
    Aggregation aggregation = response.getAggregations().get("label_agg");
    StringTerms st = (StringTerms) aggregation;
    return st.getBuckets().stream()
      .map(bucket -> bucket.getKeyAsString())
      .collect(toList());

Way 2 : Use cardinality of aggregation Api: Sample elasticquery:

{
  "size": 0,
  "aggs": {
    "distinct": {
      "cardinality": {
        "field": "Email_client",
        "size" : 100
      }
    }
  }

Java code sample=>

AggregationBuilder agg11 = AggregationBuilders.cardinality("distinct").field("Email_client");
        SearchResponse response11 = client.prepareSearch("emails")// we can give multiple index names here
                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
                .setQuery(query11)
                .addAggregation(agg11)
                .setExplain(true)
                .setSize(0)
                .get();
  • 1
    Way1 trowns an exception "invalid term-aggregator order path [_key]. Unknown aggregation [_key]"}],"type":"search_phase_execution_exception" – rogger2016 Jun 11 '19 at 08:23
  • @rogger2016 {"type":"aggregation_execution_exception","reason":"Invalid term-aggregator order path [_key]. Unknown aggregation [_key]"} facing the same error. Is there any solution you found? – Siva Jan 02 '20 at 19:24
  • Second approach gives the duplicate values. Both the solutions are not working for me – Siva Jan 02 '20 at 20:36