2

I am trying to sync 1 million record to ES, and I am doing it using bulk API in batch of 2k. But after inserting around 25k-32k, elastic search is giving following exception.

Unable to parse response body: org.elasticsearch.ElasticsearchStatusException
ElasticsearchStatusException[Unable to parse response body]; nested: ResponseException[method [POST], host [**********], URI [/_bulk?timeout=1m], status line [HTTP/1.1 403 Request throttled due to too many requests]
403 Request throttled due to too many requests /_bulk]; nested: ResponseException[method [POST], host [************], URI [/_bulk?timeout=1m], status line [HTTP/1.1 403 Request throttled due to too many requests]
403 Request throttled due to too many requests /_bulk];

I am using aws elastic search. I think, I need to implement wait strategy to handle it, something like keep checking es status and call bulk insert if status all of ES okay. But not sure how to implement it? Does ES offers anything pre-build for it? Or Anything better way to handle this?

Thanks in advance.

Update: I am using AWS elastic search version 6.8

Manish
  • 159
  • 2
  • 16
  • 1
    Check out https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-speed.html and https://stackoverflow.com/a/64896210/6200672 once. – dravit Feb 22 '21 at 11:10
  • @dravit I checked those two answer, but they did not answer what I am looking for. Basically I am looking for approach to do "exponential backoff". – Manish Feb 22 '21 at 11:54
  • Any specific reason for exponential backoff approach? – dravit Feb 22 '21 at 15:06
  • @dravit so I am using AWS elastic search, and they support ES till 7.9 version only as of now. And my team using 6.8 as of current situation. Also, I am already making call using bulk API in batch size of 2000 documents. Currently, I am working on approach to check the ES response after calling the bulk API and wait for sometime and send next bulk request. If you have any better suggest please share. – Manish Feb 23 '21 at 01:57
  • ES started throwing error cause of high mem usage. Even on my local machine, I've never faced this issue. It could be a limitation/error by AWS based on the pricing and your current plan. Can you try with a bigger batch size, say 6k and an explicit wait of 5 seconds? Also, what is the value of `refresh_interval` (You'll find it in index settings)? – dravit Feb 24 '21 at 05:13
  • @dravit I am able to insert 100k now by doing something like this: bulkRequest.timeout(TimeValue.timeValueMinutes(2)); bulkRequest.setRefreshPolicy(RefreshPolicy.WAIT_UNTIL); BulkResponse bulkResponse = esWrapper.bulkRequest(bulkRequest); //sleep for 1 sec sleepAfterBatchWrite(); – Manish Feb 24 '21 at 07:28
  • I don't think that using a timeout is the way to go here. I would advise to maybe use smaller batch sizes and write your code in a way that only sends the next batch once the previous one is done and the call has returned. – Val Feb 25 '21 at 05:11
  • @Val , Yes currently I am doing that only, waiting 1 sec before sending next batch. But by this approach it will take lot of time to insert all data – Manish Feb 25 '21 at 05:14

2 Answers2

1

Thanks @dravit for including my previous SO answer in the comment, after following the comments it seems OP wants to improve the performance of bulk indexing and want exponential backoff, which i don't think Elasticsearch provides out of the box.

I see that you are putting a pause of 1 second after every second which will not work in all the cases, and if you have large number of batches and documents to be indexed, for sure it will take a lot of time. There are few more suggestions from my side to improve the performance.

  1. Follow my tips to improve the reindex speed in Elasticsearch and see what all things listed here is applicable and doing them improves speed by what factor.
  2. Find a batching strategy which best suits to your environment, I am not sure but this article from @spinscale who is the developer of java high level rest client might help or you can ask a question on https://discuss.elastic.co/, I remembered he shared a very good batching strategy in one of his webinar but couldn't find the link of it.
  3. Notice various ES metrics apart from bulk threadpool and queue size, and see if your ES still has capacity can you increase the queue size and increase the rate by which you can send requests to ES.
Amit
  • 30,756
  • 6
  • 57
  • 88
  • 1
    I found something like, I will be trying this in next week mostly. ES have backoff internally builder.setBackoffPolicy(BackoffPolicy .constantBackoff(TimeValue.timeValueSeconds(1L), 3)); link: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-document-bulk.html – Manish Feb 26 '21 at 13:03
-2

Check the error handling guide here

If you receive persistent 403 Request throttled due to too many requests or 429 Too Many Requests errors, consider scaling vertically. Amazon Elasticsearch Service throttles requests if the payload would cause memory usage to exceed the maximum size of the Java heap.

Scale your application vertically or increase delay between requests.

  • probably not the best way cause this can be controlled in other ways also. – dravit Feb 22 '21 at 15:05
  • @mehdi fahti , I have read this in document, please read the question carefully. I am trying to ask for a better way to "increase delay between requests". Please read the question. – Manish Feb 23 '21 at 02:02