2

Wondering if I can create a "dynamic mapping" within an elasticsearch index. The problem I am trying to solve is the following: I have a schema that has an attribute that contains an object that can differ greatly between records. I would like to mirror this data within elasticsearch if possible but believe that automatic mapping may get in the way.

Imagine a scenario where I have a schema like the following:

{
    name: string
    origin: string
    payload: object // can be of any type / schema
}

Is it possible to create a mapping that supports this? I do not need to query the records by this payload attribute, but it would be great if I can.

Note that I have checked the documentation but am confused on if what elastic calls dynamic mapping is what I am looking for.

user2737876
  • 1,038
  • 2
  • 12
  • 20

1 Answers1

0

It's certainly possible to specify which queryable fields you expect the payload to contain and what those fields' mappings should be.

Let's say each doc will include the fields payload.livemode and payload.created_at. If these are the only two fields you'll want to perform queries on, and you'd like to disable dynamic, index-time mappings autogenerated by Elasticsearch for the rest of the fields, you can use dynamic templates like so:

PUT my-payload-index
{
  "mappings": {
    "dynamic_templates": [
      {
        "variable_payload": {
          "path_match": "payload",
          "mapping": {
            "type": "object",
            "dynamic": false,
            "properties": {
              "created_at": {
                "type": "date",
                "format": "yyyy-MM-dd HH:mm:ss"
              },
              "livemode": {
                "type": "boolean"
              }
            }
          }
        }
      }
    ],
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "origin": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

Then, as you ingest your docs:

POST my-payload-index/_doc
{
  "name": "abc",
  "origin": "web.dev",
  "payload": {
    "created_at": "2021-04-05 08:00:00",
    "livemode": false,
    "abc":"def"
  }
}

POST my-payload-index/_doc
{
  "name": "abc",
  "origin": "web.dev",
  "payload": {
    "created_at": "2021-04-05 08:00:00",
    "livemode": true,
    "modified_at": "2021-04-05 09:00:00"
  }
}

and verify with

GET my-payload-index/_mapping

no new mappings will be generated for the fields payload.abc nor payload.modified_at.

Not only that — the new fields will also be ignored, as per the documentation:

These fields will not be indexed or searchable, but will still appear in the _source field of returned hits.

Side note: if fields are neither stored nor searchable, they're effectively the opposite of enabled.


The Big Picture

Working with variable contents of a single, top-level object is quite standard. Take for instance the stripe event object — each event has an id, an api_version and a few other shared params. Then there's the data object that's analogous to your payload field.

Now, all is fine, until you need to aggregate on the contents of your payload. See, since the content is variable, so are the data paths / accessors. But wildcards in aggregation paths don't work in Elasticsearch. Scripts do but are onerous to maintain.

Back to stripe. They partially solved it through what they call polymorphic, typed hashes — as discussed in their blog on API design:

enter image description here

A pretty neat approach that's worth emulating.


P.S. I discuss dynamic templates in more detail in the chapter "Mapping Automation" of my ES Handbook.

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68