1

I am new to elasticsearch. I have an Elasticsearch index of about 300,000 items. For each of the 60 million records in another table, I need to make a complex query to this ES index.

Right now, it is extremely slow (making 1000 queries would take 200 seconds). I need advice on how to configure my elasticsearch cluster to handle a large volume of queries.

My server:

8 core
8GB ram
SSD Hardware

I want to config elasticsearch to handle 1000 concurrent search requests from ruby. (I want to search 1000 items in parallel).

I have try with the default config

I think by default, elasticesearch can only handle about 10-20 concurrent search request. It use little cpu and ram. Therefore, I think I could improve it.

I could only run 100 threads from ruby to search 1000 items and it takes about 200 seconds. If I increase to 1000 threads from ruby, ES returns timeout error message.

I run a master node with

ES_HEAP_SIZE=2G

indices.fielddata.cache.size: 1g 

threadpool:   
   search:
      type: fixed
      size: 200
      queue_size: 400

shares: 5 

replicas: 1

Running 100 threads from ruby to search 1000 items still takes 200s.

I add 3 new nodes as data nodes on this server.

Running 100 threads from ruby to search 1000 items still takes 200s or more.

I google and read from some posts. People say that create more shards will make search become slow.

How can I improve my search query?

Many thanks!

AdamNYC
  • 19,887
  • 29
  • 98
  • 154
Minh Ha Pham
  • 2,566
  • 2
  • 28
  • 43
  • 1
    Can you provide your search queries ? – Michael at qbox.io Apr 24 '14 at 14:43
  • As you mentioned that query performance got improved with minimal changes to config, it will be helpful if you can share the changes and performance after change – gsuresh92 Oct 29 '15 at 10:54
  • @gsuresh92: I do this task about one year ago so I don't have the detail information right now. There are some key points that I could share: Increase max open file at `/etc/security/limits.conf` to 65k or 100k. Increase `ES_HEAP_SIZE` to 50% ram on Machine. If you want to improve parallel search, you could increase `threadpool.search.size`. When you increase this value, you may see many query fail so increase `threadpool.search.queue_size` may help but the avg time for query will be increase – Minh Ha Pham Oct 30 '15 at 05:45
  • @gsuresh92: I recommend to check the link in below answer. It is very helpful for me at that time when I work with ES Cluster – Minh Ha Pham Oct 30 '15 at 05:47
  • @MinhHa Thank you very much :) – gsuresh92 Oct 30 '15 at 08:15

1 Answers1

3

you're going to want to watch this video:

http://www.elasticsearch.org/webinars/elasticsearch-pre-flight-checklist/

The defaults for ES are great for development but not production. The one thing that you really need to do is give the JVM 50% of the available memory on the server. That video has lots of other great tips.

jhilden
  • 12,207
  • 5
  • 53
  • 76
  • Thank you for your suggestion, this video is very helpful. I will try to config my ES server to see how it work. I could not vote up because I new to stackoverflow. – Minh Ha Pham Apr 25 '14 at 04:00
  • I follow some tips in the video and I see that my cluster is better now. But when I run search job from ruby (500 search requests in the sametime). I see that ES use very litter CPU (only 10%). I use 4 nodes. I do not know why ES do not use much CPU – Minh Ha Pham Apr 25 '14 at 09:06
  • @MinhHa check also disk I/O (on linux you can use `iotop`). maybe the cpu is just waiting for the disk to load data – ulkas Mar 09 '16 at 13:38