0

I've got an ElasticSearch index that records hits, with a time and token field, unique to each user. I'd like to know, for the last x days, how many unique tokens I have that have had 15 or more hits. ("Brand Lovers" in marketing-speak.)

Any ideas on how to achieve this?

Thanks!

  • Added a generic query, if you want a working query, please provide few sample documents – Amit Sep 08 '22 at 10:48

1 Answers1

0

You can use the range query with terms aggregation to get required results

{
    "size": 0,
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "hits": {
                            "gte": 15
                        }
                    }
                },
                {
                    "range": {
                        "time": {
                            "gte": "now-1d/d",
                            "lt": "now"
                        }
                    }
                }
            ],
            "minimum_should_match": 1
        }
    },
    "aggs": {
        "distinct_tokens": {
            "terms": {
                "field": "tokens"
            }
        }
    }
}
Amit
  • 30,756
  • 6
  • 57
  • 88
  • Thanks. I'm not sure why but this seems to use a significant amount of memory, and the query cannot complete. `[parent] Data too large, data for [] would be [1328218080/1.2gb], which is larger than the limit of [1020054732/972.7mb], real usage: [1029342112/981.6mb], new bytes reserved: [298875968/285mb], usages [request=298888256/285mb, fielddata=411157872/392.1mb, in_flight_requests=1803362/1.7mb, model_inference=0/0b, eql_sequence=0/0b, accounting=14469608/13.7mb]` – Jason Norwood-Young Sep 08 '22 at 12:20
  • @JasonNorwood-Young how many documents you have and how much heap you assigned to your Elasticsearch process? – Amit Sep 08 '22 at 12:25
  • About 500m. I'm going to try increase the heap size and clear the cache as suggested here: https://stackoverflow.com/questions/29810531/elasticsearch-kibana-errors-data-too-large-data-for-timestamp-would-be-la – Jason Norwood-Young Sep 08 '22 at 12:35
  • @JasonNorwood-Young, sure 500 Mb is very less and keep us posted :) – Amit Sep 08 '22 at 12:39
  • Oh, sorry, 500 million records, not MB. The index is 320.4GB big. – Jason Norwood-Young Sep 08 '22 at 12:42
  • wow, 500 M documents, and Elasticsearch heap size on a node shouldn't cross 30 GB – Amit Sep 08 '22 at 12:51
  • @JasonNorwood-Young any update on it? – Amit Oct 06 '22 at 10:23
  • I've figured out I need to build an ElasticSearch pipeline aggregation to achieve what I want. I increased heap size to 16GB, which helped, but the transaction still fails about half the time. I'll look at adding another server too. I should get to it next week and will give a full update! – Jason Norwood-Young Oct 07 '22 at 06:34