i have been trying to get word frequency in Elasticsearch. I using Elasticsearch Python and Elasticsearch DSL Python Client.
here is my code:
client = Elasticsearch(["my_ip_machine:port"])
s = Search(using=client, index=settings.ES_INDEX) \
.filter("term",content=keyword)\
.filter("term",provider=json_input["media"])\
.filter("range",**{'publish': {"from": begin,"to": end}})
s.aggs.bucket("group_by_state","terms",field="content")
result = s.execute()
i run that code and i get output like this: (i modified the output more concise)
{
"word1": 8,
"word2": 8,
"word3": 6,
"word4": 4,
}
The code run without problem in Elasticsearch with only 2000 document in my laptop. But, got problem when run that code in my Droplet in DO. I have >2.000.000 document in my Elasticsearch and i use Droplet with 1 GB RAM. Every time i run that code, memory usage will increase and Elasticsearch is shutting down.
There is another way (more efficient) to get word frequency in Elasticsearch with large document? Answer in Elasticsearch query is not problem, i will convert to DSL.
Thank you :)