0

I'm looking for ways to cut down the response time on elasticsearch percolation and reducing the CPU utilization while, its being performed.

Tried a bunch of steps and was successful in bring down the response time but, it impacted the CPU utilization. I'm using elasticsearch 5.6 and I'm checking whether, I can get a response time of less than 2 seconds at-least.

The steps are mentioned below:

  1. Tried running the percolator query with 1 node and 1 shard. The response time was very poor. It varied between 40 - 37 seconds.

  2. Tried running the percolator query with 1 node and 3 shards. The response time was better but, not great. It varied between 16 - 14 seconds. This was a scenario, where I attempted over-allocation of shards to see, if that made a difference and though, the response time got better, the CPU utilization was over 90% on a 4 core and 32 GB VM. Memory spike was there but, nothing alarming. I think, memory would become a concern, if consecutive percolator queries would have been attempted.

  3. Tried running the percolator query with 1 node and 10 shards. The response time was better but, not great. It varied between 15 - 13 seconds.

  4. Checked out some links on elasticsearch git discussion and tried reducing the terms but, that started affected the scoring and hence, had to abandon this step as, the scoring and matching should be consistent for the use case, I'm trying out.

Links that, I referred to are mentioned below.

How to improve percolator performance in ElasticSearch?

https://github.com/elastic/elasticsearch/issues/26307

https://github.com/elastic/elasticsearch/issues/25445

  • Can you provide a recreation script? How big is your data? – Val Jan 18 '18 at 12:19
  • The index size is 6 MB and there are 1700 documents in it. Wanted to try it on a small set before attempting anything large. Do you need the query? Cause, I'm not sure I'm allowed to share the data. – Tejas Chandra Jan 19 '18 at 12:37
  • 6MB for 1700 docs is a reaaaally tiny index... it should run in a breeze... How big are your documents? What kind of query are you trying to percolate? A complex one? – Val Jan 19 '18 at 12:39
  • Its in KBs. The query on the other hand is complex with scripting being used to replace elastic's default scoring by adding weights. The terms are over a 100. – Tejas Chandra Jan 19 '18 at 13:35
  • Ok, scripting might explain the bad perf then. But without knowing your mapping and query(ies), it'll be hard to optimize that – Val Jan 19 '18 at 13:52
  • Sorry for the delay in response. Below is a sample percolator query, we are using for testing purpose. The original query is too big to post here. The fields **analysis.very_higher.skill_name** and **analysis.higher.skill_name** are mapped as keyword and the field **additional_skills.skill_name** is mapped as text with standard elastic analyzer. – Tejas Chandra Jan 23 '18 at 09:58
  • {"query":{"bool":{"should":[{"function_score":{"boost_mode":"replace","query":{"query_string":{"fields":["analysis.very_higher.skill_name","additional_skills.skill_name"],"query":"\"java-j2ee_l1\"|\"java-j2ee\""}},"score_mode":"sum","script_score":{"script":{"lang":"groovy","params":{"param1":0.45},"source":"param1"}}}}]},"disable_coord": true}} – Tejas Chandra Jan 23 '18 at 10:00
  • {"query":{"bool":{"should":[{"function_score":{"boost_mode":"replace","query":{"query_string":{"fields":["analysis.higher.skill_name"],"query":"\"java-j2ee_l1\" -(additional_skills.skill_name:\"java-j2ee\")"}},"score_mode":"sum","script_score":{"script":{"lang":"groovy","params":{"param1":0.405},"source":"param1"}}}}]},"disable_coord": true}} – Tejas Chandra Jan 23 '18 at 10:00
  • Usually, the function_score queries above come in as elements under 'should'. I have separated them here for convenience... – Tejas Chandra Jan 23 '18 at 10:02

0 Answers0