3

I have a two node cluster hosted in ElasticCloud.

Host     Elastic Cloud
Platform Google Cloud
Region   US Central 1 (Iowa)
Memory   8 GB
Storage  192 GB
SSD      Yes
HA       Yes

Each node has:

Allocated Processors    2
Number of processors    2
Number of indices       4*
Shards (p/ index)       5*
Number of replicas      1
Number of document      150M
Allocated Disk          150GB

* the main indices, kibana and watcher creates a bunch of small indices.

My documents are mostly text. There are some other fields (no more than 5 per index), no nested objects. Indices specs:

| Index   | Avg Doc Length | # Docs | Disk |
|---------|----------------|--------|------|
| index-1 | 300            | 80M    | 70GB |
| index-2 | 500            | 5M     | 5GB  |
| index-3 | 3000           | 2M     | 10GB |
| index-4 | 2500           | 18M    | 54GB |

When system is idle, response time (load time) is typically few seconds. But when I simulate the behavior of 10 users I start to get timeouts in my application. Originally timeout was 10s, I updated it to 60s and I am still having issues. Here follows a chart for simulation of 10 concurrent users using Search Api.

enter image description here

Red line is total request time in seconds and dotted pink line is my 60 seconds timeout. So, I'd say in most of the times my users will experience an timeout. The query I've used is quite simple:

{
    "size": 500,
    "from": ${FROM},
    "query":{
        "query_string": {
            "query": "good OR bad"
        }
    }
}

I've tried all possible tweaks that came to my knowledge. I don't know if that is the real ES performance and I have to accept it and upgrade my plan.

Montenegrodr
  • 1,597
  • 1
  • 16
  • 30
  • I would have liked to see more graphs from your 10 users test: CPU usage, garbage collection behavior at least. 2 CPUs per node is not that much: the combined CPUs on those two nodes is what a better laptop has these days in it at the minimum. – Andrei Stefan Jan 17 '18 at 23:58
  • @AndreiStefan I don't have them anymore. But cpu and memory pressure weren't nearly 100% (I would say they were slightly above 50%). And I had some GC peaks. I think the problem is with RAM.. There're 150GB of data and only 8GB (4 per node) for those. Does this make sense? – Montenegrodr Jan 18 '18 at 17:43
  • If you don't have those, I will not try to guess what was the reason. – Andrei Stefan Jan 19 '18 at 05:08

0 Answers0