optimise server operations with elasticsearch : addressing low disk watermarks

Question

EDITED - Based on comments of @opster elasticsearch ninja, I edited original question to keep it focused on low disk watermarks error for ES.

For more general server optimization on small machine, see: Debugging Elasticsearch and tuning on small server, single node

For original follow up on the original question and considerations related to debugging ES failures, also: https://chat.stackoverflow.com/rooms/213776/discussion-between-opster-elasticsearch-ninja-and-user305883

Problem : I noticed that elasticsearch is failing frequently, and need to restart the server manually.

This question may relate to: High disk watermark exceeded even when there is not much data in my index

I want to have a better understanding about what elasticsearch will do if the disk size fails, how to optimise configuration and only afterwards eventually restart automatically when system fails.

Could you help in understanding how to read the elasticsearch journal and make a choice to fix the problems accordingly, suggesting best practices to tune server ops on a small server machine ?

My priority is not to have system crash; it is ok to have a bit less performance, no budget to increase server size.

Hardware

I am running elasticsearch on a single small server (2GB), have 3 index (500mb, 20mb and 65mb of store size) and several GB free on disk (state solid) : I would like to allow use virtual memory VS consuming RAM.

Below what I did:

What does the journal say?

journalctl | grep elasticsearch> explore failures related to ES.

    May 13 05:44:15 ubuntu systemd[1]: elasticsearch.service: Main process exited, code=killed, status=9/KILL
May 13 05:44:15 ubuntu systemd[1]: elasticsearch.service: Unit entered failed state.
May 13 05:44:15 ubuntu systemd[1]: elasticsearch.service: Failed with result 'signal'.

Here I can see ES was killed.

EDITED : I have found due to out of memory error from java, see below error in service elasticsearch status ; readers may also find useful to run:

java -XX:+PrintFlagsFinal -version | grep -iE 'HeapSize|PermSize|ThreadStackSize'

to check current memory assignment.

What does the ES log say?

check:

/var/log/elasticsearch


[2020-05-09T14:17:48,766][WARN ][o.e.c.r.a.DiskThresholdMonitor] [my_clustername-master] high disk watermark [90%] exceeded on [Ynm6YG-MQyevaDqT2n9OeA][awesome3-master][/var/lib/elasticsearch/nodes/0] free: 1.7gb[7.6%], shards will be relocated away from this node
[2020-05-09T14:17:48,766][INFO ][o.e.c.r.a.DiskThresholdMonitor] [my_clustername-master] rerouting shards: [high disk watermark exceeded on one or more nodes]

what does "shards will be relocated away from this node" if I only have one server and one instance working ?

service elasticsearch status

 Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2020-05-09 13:47:02 UTC; 32min ago
     Docs: http://www.elastic.co
  Process: 22691 ExecStartPre=/usr/share/elasticsearch/bin/elasticsearch-systemd-pre-exec (code=exited, status=0/SUCCES
 Main PID: 22694 (java)
   CGroup: /system.slice/elasticsearch.service
           └─22694 /usr/bin/java -Xms512m -Xmx512m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+U

What does my configuration say ?

I am using a default configuration of `/etc/elasticsearch/elasticsearch.yml´

and don't have any options configured for watermark, like in https://stackoverflow.com/a/52006486/305883

Should I include them ? What would they do ?

Please note I have uncommented #bootstrap.memory_lock: true because I only have 2gb of ram.

Even if elasticsearch will perform poorly if memory is swapping, my priority is that it does not fail, and the sites stays up and running.

Running on a Single node machine - how to handle unassigned replicas ?

I understood that replicas cannot be assigned on the same nodes. As a consequence, does it make sense to have replicas on a single node ? If a primary index will fail, replicas will come to rescue or will they be unused anyway ?

I wonder if I should delete them and make space, or better not to.

This journal log you shared is not related to the `elasticsearch` service, it is a `sshd` service log. The high disk watermark is an `elasticsearch` config that has a default value of `90%`, which means that if the disk where elasticsearch is saving the data has a use of more than `90%` it will stop to index anything, your log says that you have only `7.6%` free (more than 90% in use), where do have several GB free? It is in another disk? If so, you could move your data directory to this disk. — leandrojmp, May 09 '20 at 18:29
Thank you for clarifying my mistake in thinking that log was related to ES. I found that there were huge logs from an application that consumed disk space. From your comment and below answer, I understand that there is not much but to augment disk space, to address low disk watermarks. Is it correct to say that? Which best practices would you suggest to restart ES automatically out of critical risk-to-crash? About authentication errors, how would you suggest to use the journal, to check - are still there security risks or performance risks, due to continuous login attempts ? — user305883, May 11 '20 at 09:28

score 1 · Accepted Answer · edited Feb 26 '21 at 17:08

1

Explanation of your question:

Shards will be relocated away from this node" if I only have one server and one instance working?

Elasticsearch considers the available disk space before deciding whether to allocate new shards, relocate shards away or put all indices on reading mode based on a different threshold of this error, Reason being Elasticsearch indices consists of different shards which are persisted on data nodes and low disk space can cause the above issues.

In your case, as you have just one data node, all the indices on the same data node will be put into reading mode and even if you free up space it wouldn't come in writing mode until you explicitly hit the API mentioned in opster's guide.

Edit: On a single node it would be better to disable replica as Elasticsearch would not allocate replica of a shard to the same data node. So it doesn't make sense to have replicas on a single node Elasticasearch cluster and doing that will unnecessary mark your index and cluster health to yellow(missing replica).

edited Feb 26 '21 at 17:08

Martijn Pieters

1,048,767
296
4,058
3,343

answered May 10 '20 at 01:22

Amit

30,756
6
57
88

1

thank you for clarifying what happens with one single node, and for your useful service offering recommendation on ES config – user305883 May 11 '20 at 09:15
Unfortunately the issue is still open - I freed disk space, and found out today that ES failed. And failed to automatically restart. I noticed in the log there is a `ShardNotFoundException` and that, even after having restarted the service, `curl 'localhost:9200/_cat/shards?v'`shows unassigned index. I don't know if that is the problem for having ES crashing. please let me know if I should modify the question to make it more general, for it is now not related to low disk watermarks, but I see is more related on how to debug ES, and maybe related to tuning a small server instance – user305883 May 12 '20 at 14:58
@user primary shard of one of your index is missing, which is causing this new issue and log, would advise you to open a new issue with all the details ie ES startup logs and all the index and their shards info, you can use `_cat/indices?v` API for getting the index and their shards and replica info – Amit May 12 '20 at 15:08
@user305883, please comment here as soon as you have new question with all detail and I shall look into it – Amit May 12 '20 at 15:10
1

kind of you, opster. I posted a new question then: https://stackoverflow.com/questions/61755662/debugging-elasticsearch-and-tuning-on-small-server-single-node – user305883 May 12 '20 at 15:30
@user305883, thanks for the asking a new question, I will have a look at it, also it would be great if you can mark this answer as it solved the low disk watermark issue and marking answer would help the community to identify the solved answer :) Also if possible remove journal logs from question and keep only disk watermark part to make it more readable and focused. – Amit May 13 '20 at 04:48
your suggestions sound good, however is it safe to exclude unauthorized attempts to login, reported by journal, are having any impact on reliability of service ? Initially I thought journal log was about ES, then I thought that ES might go down as a consequence of unauthorized attacks. If we can exclude this as possible cause, I follow your suggestions and will be glad to make the question more useful for the community. Thank you for the guidance and tips! – user305883 May 13 '20 at 09:35
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/213776/discussion-between-opster-elasticsearch-ninja-and-user305883). – Amit May 13 '20 at 11:55

optimise server operations with elasticsearch : addressing low disk watermarks

1 Answers1

Linked