5

Situation: I have an index with strict mapping and I want to delete an old field from it which is no longer used. So I create a new index with mapping that doesn't include that field and I try to reindex the data into the new index.

Problem: When I reindex, I get an error, because I'm trying to index data into a field that is not available in the mapping. So to solve this, I want to remove that field from all documents in the original index first, before I can reindex.

PUT old_index/_doc/1
{
    "field_to_delete" : 5
}
PUT old_index/_doc/2
{
    "field_to_delete" : null
}
POST _reindex
{
  "source": {
    "index": "old_index"
  },
  "dest": {
    "index": "new_index"
  }
}
"reason": "mapping set to strict, dynamic introduction of [field_to_delete] within [new_index] is not allowed"

1. Some places I found suggest doing:

POST old_index/_doc/_update_by_query
{
  "script": "ctx._source.remove('field_to_delete')",
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "field_to_delete"
          }
        }
      ]
    }
  }
}

However that doesn't match documents that have an explicit value of null, so reindexing still fails after this update.

2. Others (like members of the Elastic team in their official forum) suggest doing something like:

POST old_index/_doc/_update_by_query
{
  "script": {
    "source": """
          if (ctx._source.field_to_delete != null) {
            ctx._source.remove("field_to_delete");
          } else {
            ctx.op="noop";
          }
      """
    }
  },
  "query": {
    "match_all": {}
  }
}

However this has the same problem - it doesn't remove the second document that has an explicit value of null.

3. In the end I could just do:

POST old_index/_doc/_update_by_query
{
  "script": {
    "source": "ctx._source.remove("field_to_delete");"}
  },
  "query": {
    "match_all": {}
  }
}

But this will update all documents and for a large index could mean additional downtime during deployment.

pmishev
  • 896
  • 9
  • 22

1 Answers1

10

Eventually I found the correct way to do it, so I'm sharing it for the general knowledge:

POST old_index/_doc/_update_by_query
{
  "script": {
    "source": """
        if (ctx._source.containsKey("field_to_delete")) {
            ctx._source.remove("field_to_delete");
        } else {
          ctx.op="noop";
        }
      """
  },
  "query": {
    "match_all": {}
  }
}
Nikita Vlasenko
  • 4,004
  • 7
  • 47
  • 87
pmishev
  • 896
  • 9
  • 22
  • I was going to suggest using `containsKey`, good catch ;-) ES will still have to iterate over all the documents, but only those having `field_to_delete` will effectively be updated. – Val Jun 18 '19 at 12:43
  • you should use an exists query instead of the match_all. Then your script will only be run on document with the field to delete. See here https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-exists-query.html – Pierre Mallet Jun 18 '19 at 13:38
  • @PierreMallet: Unfortunately that wouldn't work, as the query would not return the document with `"field_to_delete" : null` and it won't be updated – pmishev Jun 18 '19 at 15:50
  • Oh yes you are right :/ you could maybe work around this with an explicit null_value. See here https://www.elastic.co/guide/en/elasticsearch/reference/current/null-value.html – Pierre Mallet Jun 18 '19 at 20:54