1

I am facing issue with UpdateByQuery API while trying to update a document which doesn’t exist in Elastic search

Problem description

  1. We are creating one index for each day like test_index-2020.03.11, test_index-2020.03.12… and we maintain eight days (today’s as well as last week seven days) indexes.

  2. When data arrives (reading one by one or in a bulk from Kafka topic) either we need to update (which may exist in any one of the 8 days indexes) if data already exists with given ID or save it if not exist (to current day index).

The solution, I am trying currently when data arrive one by one:

  • Using UpdateByQuery with an inline script to update the doc

  • If BulkByScrollResponse returns Updated count 0, then save the doc

Issues:

Even if doc doesn’t exist still I can see BulkByScrollResponse returns updated field as non-zero (1,2,3,4…) as follows

BulkIndexByScrollResponse[sliceId=null,updated=1,created=0,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s]

Due to this unable to trigger document save request.

How to approach if the bulk of documents (having set of different doc IDs) need to be updated with their respective content with single request? Will I be able to achieve with UpdateByQuery?

Note: Considering the amount of data to be processed per hour we need to avoid multiple hits to Elasticsearch.

Doc ID is in the format of str1:str2:Used:Sat Mar 14 23:34:39 IST 2020

But even if doc doesn't exist still i can see updated count as non zero

Adding couple of more points about the approach i am trying: -In my case there is always only one doc which has to get updated per request, as i am trying to update the doc matching the given ID -We have configured shards and replica as "number_of_shards": 10, "number_of_replicas": 1 -We are going with this approach as we don't know in which index actual doc resides

If there is maximum one document matching then Updated field of the response should not have more than 1

Following are couple of output which i get as a part of response: BulkIndexByScrollResponse[sliceId=null,updated=9,created=0,deleted=0,batches=1,versionConflicts=1,noops=0,retries=0,throttledUntil=0s] BulkIndexByScrollResponse[sliceId=null,updated=10,created=0,deleted=0,batches=1,versionConflicts=0,noops=0,retries=0,throttledUntil=0s]

Om S patel
  • 31
  • 2
  • If BulkIndexByScrollResponse responds with updated = 1 it means that the document actually existed. Why do you think this wasn't the case? – Val Mar 15 '20 at 05:52
  • In my understanding updated count should be telling how many docs got updated as a part of this request,following link tells the same https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-document-update-by-query.html Added more facts about the approach and outcome. – Om S patel Mar 15 '20 at 12:40
  • My question was how do you know that the document didn't exist before the update ? – Val Mar 15 '20 at 12:59
  • By querying Elastic search through client, Kibana. – Om S patel Mar 15 '20 at 15:57
  • Can you give a concrete example that backs your claims? – Val Mar 15 '20 at 16:10
  • Sample problem here. Did you find any solution? – Dinesh Aug 31 '21 at 16:43

0 Answers0