0

I am using Elastic 5.1.2 version. I am using Elastic java Restclient for posting and query the documents from Elastic search.

I am not getting the accurate count of documents when i use GET operation immediately after the POST operation. If i sleep for > 1 sec between POST and GET, then the GET operation is able to give the accurate count.

My flow of operations are: 1) post a new transaction(document) using POST operation. 2) retrieve the count of total documents using GET operation. this is immediately after the POST.

I am thinking that, Elastic is taking time to update the index. Is this a problem with Elasticsearch or with any of my settings. Please help

Srinivas KK
  • 151
  • 3
  • 12

1 Answers1

2

That's normal behavior! When you index new data it isn't immediately available, but will be after the next refresh which happens once every second by default.

If that bothers you, you have a few options:

  1. you can call the /_refresh endpoint after POSTing your documents and that will immediately refresh your index and the next GET call will work
  2. you can add the ?refresh=true parameter in your POST call when you index documents and that will basically do the same as 1
  3. you can add the ?refresh=wait_for parameter in your POST call when you index documents and the call will only return when the refresh operation has occurred, so that the next GET call will return the documents
  4. you can decrease the index.refresh_interval in your settings (defaults to 1 second) so that the refresh operations happen more often.

Just know that from a performance standpoint, the least aggressive way of achieving what you want is 3, i.e. that's a new parameter introduced in ES 5 that will not force a refresh on your index but will only return once the newly indexed documents are available for search. Forcing a refresh too often (1, 2 and 3) can kill your performance.

Val
  • 207,596
  • 13
  • 358
  • 360
  • Thanks Val. I was worried about the performance by doing refresh. Even though, I gave sample data here, but my actual usecase has 30 txns per second and needs high performance and also the correct count without waiting for 1 second. Since that is normal behavior of elastic, I have to choose an option and I am leaning towards option 3. Thanks for the detailed answer with references. – Srinivas KK Feb 23 '17 at 05:21
  • In this case, this answer might also help: http://stackoverflow.com/questions/31499575/how-to-deal-with-elasticsearch-index-delay/34391272#34391272 – Val Feb 23 '17 at 06:11
  • Thanks for pointing to a workaround solution. As I said, it's high volume application, will we not get into a situation like: posted 1 txn and called get and this will refresh after 1 second, within the 1 second duration, I submitted 10 more transactions. For the 11th transaction, my get operation is just after 1 second of the 1st txn post. Will my 11th transaction get will include all the 11 transactions? Or is it like, every post will be refreshed after exactly 1 second even though we have refreshes in between? – Srinivas KK Feb 23 '17 at 14:29
  • You can do any number of POST within each refresh window, but when the refresh happens, anything that was POSTed before that refresh will be available for search – Val Feb 23 '17 at 14:35
  • Thanks for clarifying that. – Srinivas KK Feb 23 '17 at 14:45