10

The goal is to build an Elasticsearch index with only the most recent documents in groups of related documents to track the current state of some monitoring counters and states.

I have crafted a simple Elasticsearch aggregation query:

{
  "size": 0,
  "aggs": {
    "group_by_monitor": {
      "terms": {
        "field": "monitor_name"
      },
      "aggs": {
        "get_latest": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "timestamp": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

It groups related documents into buckets and select the most recent document for each bucket.

Here are the different ideas I had to get the job done:

  1. directly use the aggregation query to push the results into the index, but it does not seem possible : Is it possible to put the results of an ElasticSearch aggregation back into the index?
  2. use the Logstash Elasticsearch input plugin to execute the aggregation query and the Elasticsearch output plugin to push into the index, but seems like the input plugin only looks at the hits field and is unable to handle aggregation results: Aggregation Query possible input ES plugin !
  3. use the Logstash http_poller plugin to get a JSON document, but it does not seem to allow specifying a body for the HTTP request !
  4. use the Logstash exec plugin to execute cURL commands to get the JSON but this seems quite cumbersome and my last resort.
  5. use the NEST API to build a basic application that will do polling, extract results, clean them and inject the resulting documents into the target index, but I'd like to avoid adding a new tool to maintain.

Is there a reasonably complex way of accomplishing this?

Community
  • 1
  • 1
Pragmateek
  • 13,174
  • 9
  • 74
  • 108
  • 2
    [Watcher](https://www.elastic.co/guide/en/watcher/current/index.html)? – Andrei Stefan Apr 08 '16 at 17:13
  • @AndreiStefan Thanks but AFAIK Watcher won't help for this use case. Moreover we don't have it (yet?) deployed on our infrastructure. For alerting we use **ElastAlert** which does the job perfectly too. – Pragmateek Apr 08 '16 at 17:15
  • 1
    I'm not suggesting Watcher for alerting, but for being able to [query the indices](https://www.elastic.co/guide/en/watcher/current/changing-inputs.html#loading-search-results) at a regular interval, do some [basic transformation](https://www.elastic.co/guide/en/watcher/current/using-transforms.html) on the resulted data and be able to [index back into Elasticsearch](https://www.elastic.co/guide/en/watcher/current/actions.html#actions-index). – Andrei Stefan Apr 08 '16 at 17:26
  • @AndreiStefan Thanks for these elements. Indeed Watcher seems a good alternative. But as said before we don't have it yet. :'( – Pragmateek Apr 08 '16 at 17:28
  • Hey. I have exact the same issue.Did you find a way to directly use the aggregation query to push the results into the index or an work around? Thanks – Ovidiu Rudi Nov 03 '16 at 13:30
  • 1
    @OvidiuRudi Not a direct way, I had to build a dedicated program in C# to make the plumbing. – Pragmateek Nov 04 '16 at 09:17

1 Answers1

3

Edit the logstash.conf file as follow

input {
  elasticsearch {
    hosts => "localhost" 
    index => "source_index_name" 
    type =>"index_type" 
    query => '{Query}' 
    size => 500 
    scroll => "5m" 
    docinfo => true
  }
}

output { 
  elasticsearch { 
    index => "target_index_name" 
    document_id => "%{[@metadata][_id]}"
  }
}
Akshay Patil
  • 239
  • 2
  • 12
  • Is it working now thanks to a fix of Logstash? Because at the time of the question Logstash was not handling aggregations. – Pragmateek Feb 06 '17 at 14:11
  • yup its working I tried it yesterday only on ELK(5.1.1) – Akshay Patil Feb 07 '17 at 03:47
  • OK, I trust you, you get my +1. :) – Pragmateek Feb 07 '17 at 16:29
  • i don't understand your answer. i've tested it on elastic search 5.1.1 with logstash 5.6.1 and doesn't work. where are you telling logstash to use aggregation result instead of 'hits' array? – nap.gab Sep 26 '17 at 14:23
  • For this, I have used Elaticsearc 5.5 – Akshay Patil Sep 27 '17 at 14:50
  • 4
    @AkshayPatil is not a problem about elastic search version. as you can see from source code [link](https://github.com/logstash-plugins/logstash-input-elasticsearch/blob/master/lib/logstash/inputs/elasticsearch.rb#L168) logstash simply scroll the **hits** array, so it is impossibile to read the aggreggation from elastic search, that are placed in the **aggregations** array in the response of the query. the configuration that you have posted simply copy each documents complain the query from source_index_name to target_index_name, so it ignores completely the aggregations values. – nap.gab Oct 02 '17 at 10:09