The purpose of this question is to ask the community how to go about partially updating a field without removing any other contents of that field.
There are many examples in StackOverflow to partially update ElasticSearch _source fields using python, curl, etc. The elasticsearch python library comes equipped with a elasticsearch.helpers
folder with functions - parallel_bulk
, streaming_bulk
, bulk
, which allow developers to easily update documents.
If users have data in a pandas dataframe, one can easily iterate over the rows to create a generator to update/create documents in elasticsearch. Elasticsearch documents are immutable, thus, when an update occurs elasticsearch takes the information being passed to create a new document, incrementing the docs version, while updating what needs to be updated. If a document has a field as a list, if the update request has a single value it will replace the entire list with that new value. (Many SO QAs covering this). I do not want to replace the value of that list with the new value, but instead to update a single value in a list to a new value.
For example, in my _source I have a field as ['101 country drive', '35 park drive', '277 thunderroad belway']. This field has three values, but let's say we realize that this document is incorrect and we need to update '101 country drive' to '1001 country drive'.
I do not want to delete the other values in the list, instead, I want to only update the index value with a new value.
Do I need to write a painless script to perform this action, or is there another method to perform this action?
Example: Want to update the document From ---
{'took': 176,
'timed_out': False,
'_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 0, 'relation': 'eq'},
'max_score': None,
'hits': [{'_index': 'docobot', '_type': '_doc', '_id': '19010239',
'_source': {'name': 'josephine drwaler', 'address': ['101 country drive', '35 park drive', '277 thunderroad belway']
}}]}}
to
{'took': 176,
'timed_out': False,
'_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
'hits': {'total': {'value': 0, 'relation': 'eq'},
'max_score': None,
'hits': [{'_index': 'docobot', '_type': '_doc', '_id': '19010239',
'_source': {'name': 'josephine drwaler', 'address': ['1001 country drive', '35 park drive', '277 thunderroad belway']
}}]}}
Notice that the address is updated only for the first index, but the index number should not be a factor in updating the value of address in _source.
What is the most efficient and pythonic way to go about partially updating documents in elasticsearch while keeping the integrity of the remaining values in that field?