Elastic search: get latest for a value

Question

I would like to store some mobile app version data on ElasticSearch and visualize it on Kibana/Grafana. The purpose is to know the usage of the app version.

Say I have these "mobile app init" event entries in ElasticSearch (simplified):

* clientId: ABC, clientVersion: 1.2.3, time: 2018-01-01
* clientId: DEF, clientVersion: 1.2.3, time: 2018-01-02
* clientId: GHI, clientVersion: 1.2.3, time: 2018-01-03
* clientId: DEF, clientVersion: 1.2.3, time: 2018-01-04
* clientId: GHI, clientVersion: 1.2.4, time: 2018-01-05

And I would like to have a visualization in Kibana/Grafana like:

* version 1.2.3: 2 installations
* version 1.2.4: 1 installation

According to data above, version 1.2.3 is currently in use by ABC and DEF. Version 1.2.4 is in use by 1.2.4.

ABC has only 1 entry with 1.2.3.
DEF has 2 entries, both with 1.2.3. Since I would like to see the usage of the app version per clients, duplicate data should be ignored.
GHI has 2 entries, 1 with 1.2.3 and 1 with 1.2.4. But since the latest version entry is 1.2.4, 1.2.3 entry is ignored.

If I was doing this with some code for in memory data without ElasticSearch, the algorithm would be:

get the latest entry for each client
group them by the version and count each entry

Questions:

How to get the desired output with ElasticSearch queries?
If the thing asked in question#1 is not possible, how to create visualizations on Kibana/Grafana?

Please note that, I don't want to use update/upsert operations. I just would like to simply add documents to ElasticSearch.

UPDATE: more info on this question.

From this question, I can see the de-duplication of data is possible using top_hits aggregation: Remove duplicate documents from a search in Elasticsearch

However, de-duplicating by clientId using top_hits and then grouping by clientId using terms is not possible. This is because top_hits aggregation doesn't accept any sub aggregations. Google for [top_hits] cannot accept sub-aggregations"

This is how far I came:

GET /metric/_search
{
  "aggs" : {
    "latestEntriesPerClients" : {
        "terms" : { "field" : "clientid" },
        "aggs": {
        "1": {
          "top_hits": {
            "sort": [{
                "date": {"order": "desc"}
            }],
            "size": 1
          }
          //, THIS WON'T WORK
          // "aggs": {
          //  "NAME": {
          //    "terms": {"field": "clientVersion"}
          //  }
          //}
        }
      }
    }
  }
}

Simplified output:

* clientId: ABC, clientVersion: 1.2.3, time: 2018-01-01
* clientId: DEF, clientVersion: 1.2.3, time: 2018-01-04
* clientId: GHI, clientVersion: 1.2.4, time: 2018-01-05

Now, how can I pipe this into some other aggregation? I tried pipeline aggregations but failed to get these values above grouped and counted.

[Field collapsing](https://www.elastic.co/guide/en/elasticsearch/reference/6.1/search-request-collapse.html) would be the best choice for getting only the most recent client version per client ID (also in terms of resource usage). However, aggregations ignore the `collapse`, so this won't get you the final count. You'd need store the collapsed results and run the aggregation on top of that IMO. — xeraa, Jan 22 '18 at 23:30

Elastic search: get latest for a value

0 Answers0