1

I have a Lambda that receives events from Kinesis and writes the event to ElasticSearch cluster.

doc id FirstTimestamp
d1 15974343498

Now when we receive another event, I want to update the document in the ElasticSearch to

doc id FirstTimestamp SecondTimestamp TimeTag
d1 15974343498 15974344498 1000

How can I do this without having to first GET the existing doc from ElasticSearch and then doing a PUT?

I found the update option here using which I can add the field SecondTimestamp, but how can I add the TimeTag field; it requires us to do an operation using the FirstTimestamp.

Sonu Mishra
  • 1,659
  • 4
  • 26
  • 45

1 Answers1

1

The GET operation won't be necessary.

Depending on how easily you can configure how your writes happen, you could do the following:

  1. Store a script which expects the doc-to-be-updated content as params:
POST _scripts/manage_time_tags
{
  "script": {
    "lang": "painless", 
    "source": """
      if (ctx._source.FirstTimestamp != null && params.FirstTimestamp != null) {
        ctx._source.SecondTimestamp = params.FirstTimestamp;
        ctx._source.TimeTag = ctx._source.SecondTimestamp - ctx._source.FirstTimestamp;
      }
    """
  }
}
  1. Instead of directly writing to ES as you were up until now, use the upsert method of the Update API:
POST myindex/_update/1
{
  "upsert": {
    "id": 1,
    "FirstTimestamp": 15974343498
  },
  "script": {
    "id": "manage_time_tags",
    "params": {
      "id": 1,
      "FirstTimestamp": 15974343498
    }
  }
}

This will ensure that if the document does not exist yet, the contents of upsert are synced and the script doesn't even run.

  1. As new events come in, simply call /_update/your_id again but with the most recent contents of id and FirstTimestamp.
POST myindex/_update/1
{
  "upsert": {
    "id": 1,
    "FirstTimestamp": 15974344498         
  },
  "script": {
    "id": "manage_time_tags",
    "params": {
      "id": 1,
      "FirstTimestamp": 15974344498
    }
  }
}

Note: this should not be confused with a rather poorly named scripted upsert which'll run the script irregardless of whether the doc already exists or not. This option should be omitted (or set to false).

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • Glad it helped! Hey I'm writing an elasticsearch handbook were I discuss real-word, non-trivial use cases like the one you were solving. There is a chapter or two on efficient ingestion & ingest pipelines and I think it'd bring you value. I'm close to finishing it but [let me know](https://jozefsorocin.typeform.com/to/XeQRxdwV) what else you'd like to learn about and I'll let you know when the handbook is out! – Joe - GMapsBook.com Jan 27 '21 at 10:38
  • Thanks for the answer. How does it impact the performance? – Sonu Mishra Jan 28 '21 at 00:08
  • No problem. It depends on too many variables. On the one hand, you'd be accessing `_source` which is "costly" -- performance wise; on the other hand it's a simple one-doc operation. So it should be fine. There's no other way to achieve what you're after without an `upsert`. – Joe - GMapsBook.com Jan 28 '21 at 08:52