I would like to store some mobile app version data on ElasticSearch and visualize it on Kibana/Grafana. The purpose is to know the usage of the app version.
Say I have these "mobile app init" event entries in ElasticSearch (simplified):
* clientId: ABC, clientVersion: 1.2.3, time: 2018-01-01
* clientId: DEF, clientVersion: 1.2.3, time: 2018-01-02
* clientId: GHI, clientVersion: 1.2.3, time: 2018-01-03
* clientId: DEF, clientVersion: 1.2.3, time: 2018-01-04
* clientId: GHI, clientVersion: 1.2.4, time: 2018-01-05
And I would like to have a visualization in Kibana/Grafana like:
* version 1.2.3: 2 installations
* version 1.2.4: 1 installation
According to data above, version 1.2.3 is currently in use by ABC and DEF. Version 1.2.4 is in use by 1.2.4.
- ABC has only 1 entry with 1.2.3.
- DEF has 2 entries, both with 1.2.3. Since I would like to see the usage of the app version per clients, duplicate data should be ignored.
- GHI has 2 entries, 1 with 1.2.3 and 1 with 1.2.4. But since the latest version entry is 1.2.4, 1.2.3 entry is ignored.
If I was doing this with some code for in memory data without ElasticSearch, the algorithm would be:
- get the latest entry for each client
- group them by the version and count each entry
Questions:
- How to get the desired output with ElasticSearch queries?
- If the thing asked in question#1 is not possible, how to create visualizations on Kibana/Grafana?
Please note that, I don't want to use update/upsert operations. I just would like to simply add documents to ElasticSearch.
UPDATE: more info on this question.
From this question, I can see the de-duplication of data is possible using top_hits
aggregation:
Remove duplicate documents from a search in Elasticsearch
However, de-duplicating by clientId using top_hits
and then grouping by clientId using terms
is not possible. This is because top_hits
aggregation doesn't accept any sub aggregations. Google for [top_hits] cannot accept sub-aggregations"
This is how far I came:
GET /metric/_search
{
"aggs" : {
"latestEntriesPerClients" : {
"terms" : { "field" : "clientid" },
"aggs": {
"1": {
"top_hits": {
"sort": [{
"date": {"order": "desc"}
}],
"size": 1
}
//, THIS WON'T WORK
// "aggs": {
// "NAME": {
// "terms": {"field": "clientVersion"}
// }
//}
}
}
}
}
}
Simplified output:
* clientId: ABC, clientVersion: 1.2.3, time: 2018-01-01
* clientId: DEF, clientVersion: 1.2.3, time: 2018-01-04
* clientId: GHI, clientVersion: 1.2.4, time: 2018-01-05
Now, how can I pipe this into some other aggregation? I tried pipeline aggregations but failed to get these values above grouped and counted.