2

I need to update a field of a doc in Elasticsearch and add the count of that doc in a list inside python code. The weight field contains the count of the doc in a dataset. The dataset needs to be updated from time to time.So the count of each document must be updated too. hashed_ids is a list of document ids that are in the new batch of data. the weight of matched id must be increased by the count of that id in hashed_ids. I tried the code below but it does not work.

hashed_ids = [hashlib.md5(doc.encode('utf-8')).hexdigest() for doc in shingles]
update_with_query_body = {
        "script": {
            "source": "ctx._source.content_completion.weight +=param.count",
            "lang": "painless",
            "param": {
                "count": hashed_ids.count("ctx.['_id']")
            }
        },
        "query": {
            "ids": {
                "values": hashed_ids
            }
        }
    }

for example let say a doc with id=d1b145716ce1b04ea53d1ede9875e05a and weight=5 is already present in index. and also the string d1b145716ce1b04ea53d1ede9875e05a is repeated three times in the hashed_ids so the update_with_query query shown above will match the doc in database. I need to add 3 to 5 and have 8 as final weight

Marzi Heidari
  • 2,660
  • 4
  • 25
  • 57

1 Answers1

2

I'm not aware of python but here is an e.g. based solution with a few assumptions. Let's say the following is the hashed_ids extracted:

hashed_ids = ["id1","id1","id1","id2"]

To use it in terms query we can get just the unique list of ids, i.e.

hashed_ids_unique = ["id1", "id2"]

Lets assume the doc(s) are indexed with below structure:

PUT test/_doc/1
{
  "id": "id1",
  "weight":9
}

Now we can use update by query as below:

POST test/_update_by_query
{
  "query":{
    "terms": {
      "id":["id1","id2"]
    }
  },
  "script":{
    "source":"long weightToAdd = params.hashed_ids.stream().filter(idFromList -> ctx._source.id.equals(idFromList)).count(); ctx._source.weight += weightToAdd;",
    "params":{
      "hashed_ids":["id1","id1","id1","id2"]
    }
  }
}

Explanation for script:

The following gives the count of matching ids in the hashed_ids list for the id of the current matching doc.

long weightToAdd = params.hashed_ids.stream().filter(idFromList -> ctx._source.id.equals(idFromList)).count();

The following adds up the weightToAdd to the existing value of weight in the document.

ctx._source.weight += weightToAdd;
Nishant
  • 7,504
  • 1
  • 21
  • 34