I have this particular object which contains the my_array:
"description": "My Object Description",
"my_array": [
{
"id": 1000,
"name": "abc",
"url" : "abc.html",
"content": "somebig content"
},
{
"id": 1001,
"name": "def",
"url" : "def.html",
"content": "somebig content"
},
{
"id": 1002,
"name": "xyz",
"url" : "xyz.html",
"content": "somebig content"
} ]
Each element in array contains a url. Now whenever this object changes, i have a task which hits the url for each element of the array, gets the html content for that element, and creates request document which can be indexed into elasticsearch.
Lets say, the url for id = 1001 is not accessible, and content for this element cannot be accessed. I still want to go ahead and process changes for elements 1000, and 1002. In that case my update would look like this:
"description": "My New Object Description",
"my_array": [
{
"id": 1000,
"name": "abc",
"url" : "abc-new-url.html",
"content": "some modified content"
},
{
"id": 1002,
"name": "xyz",
"url" : "xyz-new-url.html",
"content": "some modified content"
} ]
If i send this partial update to elasticsearch, the collection gets updated but element 1001 is removed from the collection.
My problem is how can i selectively update elements 1000, and 1002 without touching 1001. Index being stale with 1001 here is ok for me. One obvious choice is to fetch the existing doc from elasticsearch, and do the merging manually before doing the update. Is there any other way this partial update can be performed?
Another question, is there any way to send just the url to elasticsearch, and write a plugin to fetch the html content at index time, rather then doing it beforehand?