0

I am new to Elasticsearch. I have a huge index with around 50k documents. I have to update all the documents, when I run the update_by_query function it is throwing an error

File "E:\ApplicationsRunning\Lib\site-packages\opensearchpy\connection\http_urllib3.py", line 254, in perform_request raise ConnectionTimeout("TIMEOUT", str(e), e) opensearchpy.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=10)

How can I resolve this error or how can I update all the documents in the index?

query = {
    "script": {
        "inline": "ctx._source.name='srujan'"
    },
    "query": {
        "match_all": {}
    }
}
response = client.update_by_query(
    body=query, index=_index, wait_for_completion=True)
James Z
  • 12,209
  • 10
  • 24
  • 44

1 Answers1

1

It's because you're hitting a connection timeout as the update takes a bit longer than the default timeout.

You can increase the timeout as shown by Musab in his comment, or...

... you can also set wait_for_completion=False, the call will return immediately with the ID of an asynchronous task that will run in the background.

You can then check the completion of this task in Kibana Dev Tools, using

GET _tasks/<task_id>
Val
  • 207,596
  • 13
  • 358
  • 360
  • is there any other way, can I make a update_by_query API call and send the index data in batches, If yes how can I do that? – Srujan Gundeti Jan 24 '23 at 14:32
  • You can use some other query than `match_all` that will select fewer documents to update and send repeated update by query calls with different sub-queries – Val Jan 24 '23 at 14:35