How to config elasticsearch cluster on one server to get the best performace on search

Question

I am new to elasticsearch. I have an Elasticsearch index of about 300,000 items. For each of the 60 million records in another table, I need to make a complex query to this ES index.

Right now, it is extremely slow (making 1000 queries would take 200 seconds). I need advice on how to configure my elasticsearch cluster to handle a large volume of queries.

My server:

8 core
8GB ram
SSD Hardware

I want to config elasticsearch to handle 1000 concurrent search requests from ruby. (I want to search 1000 items in parallel).

I have try with the default config

I think by default, elasticesearch can only handle about 10-20 concurrent search request. It use little cpu and ram. Therefore, I think I could improve it.

I could only run 100 threads from ruby to search 1000 items and it takes about 200 seconds. If I increase to 1000 threads from ruby, ES returns timeout error message.

I run a master node with

ES_HEAP_SIZE=2G

indices.fielddata.cache.size: 1g 

threadpool:   
   search:
      type: fixed
      size: 200
      queue_size: 400

shares: 5 

replicas: 1

Running 100 threads from ruby to search 1000 items still takes 200s.

I add 3 new nodes as data nodes on this server.

Running 100 threads from ruby to search 1000 items still takes 200s or more.

I google and read from some posts. People say that create more shards will make search become slow.

How can I improve my search query?

Many thanks!

As you mentioned that query performance got improved with minimal changes to config, it will be helpful if you can share the changes and performance after change — gsuresh92, Oct 29 '15 at 10:54
@gsuresh92: I do this task about one year ago so I don't have the detail information right now. There are some key points that I could share: Increase max open file at `/etc/security/limits.conf` to 65k or 100k. Increase `ES_HEAP_SIZE` to 50% ram on Machine. If you want to improve parallel search, you could increase `threadpool.search.size`. When you increase this value, you may see many query fail so increase `threadpool.search.queue_size` may help but the avg time for query will be increase — Minh Ha Pham, Oct 30 '15 at 05:45
@gsuresh92: I recommend to check the link in below answer. It is very helpful for me at that time when I work with ES Cluster — Minh Ha Pham, Oct 30 '15 at 05:47

score 3 · Answer 1 · answered Apr 24 '14 at 21:49

3

you're going to want to watch this video:

http://www.elasticsearch.org/webinars/elasticsearch-pre-flight-checklist/

The defaults for ES are great for development but not production. The one thing that you really need to do is give the JVM 50% of the available memory on the server. That video has lots of other great tips.

answered Apr 24 '14 at 21:49

jhilden

12,207
5
53
76

Thank you for your suggestion, this video is very helpful. I will try to config my ES server to see how it work. I could not vote up because I new to stackoverflow. – Minh Ha Pham Apr 25 '14 at 04:00
I follow some tips in the video and I see that my cluster is better now. But when I run search job from ruby (500 search requests in the sametime). I see that ES use very litter CPU (only 10%). I use 4 nodes. I do not know why ES do not use much CPU – Minh Ha Pham Apr 25 '14 at 09:06
@MinhHa check also disk I/O (on linux you can use `iotop`). maybe the cpu is just waiting for the disk to load data – ulkas Mar 09 '16 at 13:38

How to config elasticsearch cluster on one server to get the best performace on search

1 Answers1

Linked