0

I have an Apache Beam streaming job which reads data from Kafka and writes to ElasticSearch using ElasticSearchIO.

The issue I'm having is that messages in Kafka already have key field, and using ElasticSearchIO.Write.withIdFn() I'm mapping this field to document _id field in ElasticSearch.

Having a big volume of data I don't want the key field to be also written to ElasticSearch as part of _source.

Is there an option/workaround that would allow doing that?

marknorkin
  • 3,904
  • 10
  • 46
  • 82

2 Answers2

0

Using the Ingest API and the remove processor you´ll be able to solve this pretty easy only using your elasticsearch cluster. You can also simulate ingest pipeline and the results.

I´ve prepared a example which will probably cover your case:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "remove id form incoming docs",
    "processors": [
      {"remove": {
        "field": "id",
        "ignore_failure": true
      }}
    ]
  },
  "docs": [
      {"_source":{"id":"123546", "other_field":"other value"}}
    ]
}

You see, there is one test document containing a filed "id". This field is not present in the response/result anymore:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_type",
        "_id" : "_id",
        "_source" : {
          "other_field" : "other value"
        },
        "_ingest" : {
          "timestamp" : "2018-12-03T16:33:33.885909Z"
        }
      }
    }
  ]
}
ibexit
  • 3,465
  • 1
  • 11
  • 25
0

I've created a ticket in Apache Beam JIRA describing this issue.

For now the original issue can not be resolved as part of indexation process using Apache Beam API.

The workaround that Etienne Chauchot, one of the maintainers, proposed is to have separate task which will clear indexed data afterwords.

See Remove a field from a Elasticsearch document for example.

For the future, if someone also would like to leverage such feature, you might want to follow the linked ticket.

Community
  • 1
  • 1
marknorkin
  • 3,904
  • 10
  • 46
  • 82