0

Recently, encountered increases in the heap memory usage in the master nodes (heap memory overflow master nodes continuous garbage collection ). I try to debug the root cause using the heap dump saved in the storage ( sample file name for reference: java_pid1.hprof ) but those files are encrypted unable find anything.

Is this the correct way to debug the heap memory issue, If yes, how to get the decrypted heap dump to get a proper info Else how to debug the heap memory issue in the master node

Elastic Search Info:

Running in Kubernetes Dedicated 3 master nodes 3 data nodes (which are also the ingest nodes)

3 data nodes - each node spec(ram 64GB memory limit 32GB) - heap size - 28GB disk size - 1TB 3 master nodes - each node spec(ram 16GB memory limit 4GB) - heap size - 4GB disk size - 10GB

1 Answers1

1

Hprof files can be opened inside Eclipse. Eclipse has a special plugin to open hprof files. Its called the memory analyzer tool.

I have done these excercises in the past, but usually you find nothing much there.

Thanks.

brugia
  • 473
  • 3
  • 6
  • thanks brugia for your response if it doesn't help that much is there any way to figure it out the cause of heap overflow – Balaji Arun Jun 29 '22 at 13:07
  • Yes, can you pls tell me which "plugins" are running in the elasticsearch cluster, what are your "replication" settings, and how many "Shards" do you have on the index. AN index mapping will help as long as its not something proprietary. thanks. – brugia Jun 29 '22 at 14:20
  • Please close the issue or mark the answer as correct , thank you! – brugia Jun 30 '22 at 07:35
  • But why do you need the info of shards and replicas which are more related to the data node right? this issue is in the master node still can't close the issue because didn't able to find the root cause of cont. heap overflow – Balaji Arun Jun 30 '22 at 09:52
  • The coordination of moving data between shard and replica settings is done by the master node. You will hit OOM on the master node if you haven't configured those correctly. FInally, without seeing the entire settings of the index, config.yml and other details, it is impossible to sort this issue. Thanks. – brugia Jun 30 '22 at 09:54
  • Is there any article that explains all those settings and my shard allocations are some indexes have 2 shared, some have 5 and even some indexes have more than 10 shards. about replicas always 1 for all the indexes and there is no fixed indexes they are dynamic – Balaji Arun Jul 01 '22 at 06:51
  • Every shard is going to be replicated by the number of replicas. The first thing you need to check is whether setting replicas to 0, makes any different to the OOM. If that works, it means that you have too many shards, and too little RAM. If it doesnt work, get back to me. – brugia Jul 01 '22 at 08:06
  • thanks for your response, will test this out – Balaji Arun Jul 01 '22 at 08:24