3

I am facing an issue in ElasticSearch aggregation. We are using RestHighLevelClient for querying ElasticSearch in Java.

Exception is -

ElasticsearchStatusException[Elasticsearch exception [type=search_phase_execution_exception, reason=]]; nested: ElasticsearchException[Elasticsearch exception [type=too_many_buckets_exception, reason=Trying to create too many buckets. Must be less than or equal to: [20000] but was [20001]. This limit can be set by changing the [search.max_buckets] cluster level setting.]];

I have changed search.max_buckets using a PUT request but still, I am facing this issue.

PUT /_cluster/settings { "persistent" : { "search.max_buckets":20000 } }

As per our requirement first, we have to aggregate data on daily bases then hourly basis then ruleId basis. Aggregation would look like below level -

Day{
    1:00[
       {
       ruleId : 1 ,
       count : 20
       },
       {
       ruleId : 2 ,
       count : 25
       }
    ],
    2:00[
    {
       ruleId : 1 ,
       count : 20
       },
       {
       ruleId : 2 ,
       count : 25
       }
    ]

Now my code is -

    final List<DTO> violationCaseMgmtDtos = new ArrayList<>();
        try {
            RangeQueryBuilder queryBuilders =
                (end_timestmp > 0 ? customTimeRangeQueryBuilder(start_timestmp, end_timestmp, generationTime)
                    : daysTimeRangeQueryBuilder(14, generationTime));

            BoolQueryBuilder boolQuery = new BoolQueryBuilder();
            boolQuery.must(queryBuilders);
            boolQuery.must(QueryBuilders.matchQuery("pvGroupBy", true));
            boolQuery.must(QueryBuilders.matchQuery("pvInformation", false));
            TopHitsAggregationBuilder topHitsAggregationBuilder =
                AggregationBuilders.topHits("topHits").docValueField(policyId).sort(generationTime, SortOrder.DESC);

            TermsAggregationBuilder termsAggregation = AggregationBuilders.terms("distinct").field(policyId).size(10000)
                .subAggregation(topHitsAggregationBuilder);

            DateHistogramAggregationBuilder timeHistogramAggregationBuilder =
                AggregationBuilders.dateHistogram("by_hour").field("eventDateTime")
                    .fixedInterval(DateHistogramInterval.HOUR).subAggregation(termsAggregation);

            DateHistogramAggregationBuilder dateHistogramAggregationBuilder =
                AggregationBuilders.dateHistogram("by_day").field("eventDateTime")
                    .fixedInterval(DateHistogramInterval.DAY).subAggregation(timeHistogramAggregationBuilder);

            SearchRequest searchRequest = new SearchRequest(violationDataModel);
            SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
            searchSourceBuilder.aggregation(dateHistogramAggregationBuilder);
            searchSourceBuilder.query(boolQuery);
            searchSourceBuilder.from(offset);
            searchSourceBuilder.size(10000);
            searchRequest.source(searchSourceBuilder);
            SearchResponse searchResponse = null;

            searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

            ParsedDateHistogram parsedDateHistogram = searchResponse.getAggregations().get("by_day");

            parsedDateHistogram.getBuckets().parallelStream().forEach(dayBucket -> {


                ParsedDateHistogram hourBasedData = dayBucket.getAggregations().get("by_hour");

                hourBasedData.getBuckets().parallelStream().forEach(hourBucket -> {

                    // TimeLine timeLine = new TimeLine();
                    String dateTime = hourBucket.getKeyAsString();
                    // long dateInLong = DateUtil.getMiliSecondFromStringDate(dateTime);
                    // timeLine.setViolationEventTime(dateTime);

                    ParsedLongTerms distinctPolicys = hourBucket.getAggregations().get("distinct");
                    distinctPolicys.getBuckets().parallelStream().forEach(policyBucket -> {

                        DTO violationCaseManagementDTO = new DTO();
                        violationCaseManagementDTO.setDataAggregated(true);
                        violationCaseManagementDTO.setEventDateTime(dateTime);
                        violationCaseManagementDTO.setRuleId(Long.valueOf(policyBucket.getKey().toString()));

                        ParsedTopHits parsedTopHits = policyBucket.getAggregations().get("topHits");
                        SearchHit[] searchHits = parsedTopHits.getHits().getHits();
                        SearchHit searchHit = searchHits[0];

                        String source = searchHit.getSourceAsString();
                        ViolationDataModel violationModel = null;
                        try {
                            violationModel = objectMapper.readValue(source, ViolationDataModel.class);
                        } catch (Exception e) {
                            e.printStackTrace();
                        }

                        violationCaseManagementDTO.setRuleName(violationModel.getRuleName());
                        violationCaseManagementDTO.setGenerationTime(violationModel.getGenerationTime());
                        violationCaseManagementDTO.setPriority(violationModel.getPriority());
                        violationCaseManagementDTO.setStatus(violationModel.getViolationStatus());
                        violationCaseManagementDTO.setViolationId(violationModel.getId());
                        violationCaseManagementDTO.setEntity(violationModel.getViolator());
                        violationCaseManagementDTO.setViolationType(violationModel.getViolationEntityType());
                        violationCaseManagementDTO.setIndicatorsOfAttack( (int)
                            (policyBucket.getDocCount() * violationModel.getNoOfViolatedEvents()));
                        violationCaseMgmtDtos.add(violationCaseManagementDTO);

                    });
                  //  violationCaseMgmtDtos.sort((d1,d2) -> d1.getEventDateTime().compareTo(d2.getEventDateTime()));
                });

            });

            List<DTO> realtimeViolation = findViolationWithoutGrouping(start_timestmp,  end_timestmp,  offset,  size);
            realtimeViolation.stream().forEach(action -> violationCaseMgmtDtos.add(action)); 
        } catch (Exception e) {
            e.printStackTrace();
        }

        if (Objects.nonNull(violationCaseMgmtDtos) && violationCaseMgmtDtos.size() > 0) {
            return violationCaseMgmtDtos.stream()
                .filter(violationDto -> Objects.nonNull(violationDto))
                .sorted((d1,d2) -> d2.getEventDateTime().compareTo(d1.getEventDateTime()))
                .collect(Collectors.toList());
        }
        return violationCaseMgmtDtos;
}

Please help me to resolve this issue.

nitin tyagi
  • 1,176
  • 1
  • 19
  • 52
  • If you had 20001 buckets you need to augment `search.max_buckets` higher than 20000. – Val Oct 04 '19 at 11:20
  • @Val Thanks for the comment. Earlier it gives me error for 10000 bucket then i have increased it to 20000 then it gives me error for 20000 then i have increased it 50000 then it gives me error for 50000. – nitin tyagi Oct 09 '19 at 17:51
  • 1
    I suggest you look into the [`composite` aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html) instead of trying to retrieve all term buckets at once. – Val Oct 09 '19 at 18:02
  • I agreed with Val about composite aggregation usage. The next wizard https://plnkr.co/edit/iJSFP8eRrhC7l7Hx2XOL?p=preview&preview might help you what type of aggregation suites best according to your needs. – mrDinkelman Oct 23 '20 at 07:26

1 Answers1

0

Incase you are using ES version 7.x.x then you can add terminate_after clause to your query to limit the number of buckets into which the data would be divided into. This happens mostly when the data your are trying to aggregate has a high degree of randomness.

If your data contains text then it would better to aggregate on .keyword field (assuming you are using default settings).

POST your_index/_search
{
  "from": 0,
  "query": {
    "match_all": {}
  },
  "size": 0,
  "sort": [
    {
      "your_target_field": {
        "order": "desc"
      }
    }
  ],
  "terminate_after": 10000,
  "version": true,
  "aggs": {
    "title": {
      "terms": {
        "field": "your_target_field.keyword",
        "size": 10000
      }
    }
  }
}
Abhiram
  • 362
  • 1
  • 2
  • 14