Multi-language elastic search mapping setup

Question

I have documents stored in MongoDB like so:

const demoArticle = {
  created: new Date(),
  title: [{
    language: 'english',
    value: 'This is the english title'
  }, {
    language: 'dutch',
    value: 'Dit is de nederlandse titel'
  }]
}

I want to add analyzers to specific languages, which is normally specified like so:

"mappings": {
   "article": {
      "properties": {
         "created": {
            "type": "date"
         },
         "title.value": {
           "type": "text",
           "analyzer": "english"
         }
      }
   }
}

The problem is however: depending on the language set on the child level, it should have an analyzer set according to that same language.

I've stumbled upon Dynamic Templates in ElasticSearch but I was not quite convinced this is suited for this use-case.

Any suggestions?

Depending on the number of languages you need to support, you could have one sub-field per language, i.e. `title_en.value`, `title_du.value`, etc each with its own language analyzer. — Val, Jul 30 '18 at 07:46
To be honest, I don't understand the question. You are offering a second bounty but, in my opinion, you should provide way more details. @Val did offer you an idea above. You care explaining why that would or wouldn't work? Just throwing a bounty out there wouldn't magically giving you the perfect solution. Do answer questions and explain further. As it is now, this is a poorly detailed question imo and the upvotes it got are not deserved. — Andrei Stefan, Aug 07 '18 at 11:18
I Agree with @AndreiStefan. Going by best practices either use Val's suggestion or create one index per language and then you can search across indices. Is you requirement to have all languages in one field i.e. title? You can hack your way into it by writing custom analyzer to create inverted index based on some separator between the langauges. — Polynomial Proton, Aug 07 '18 at 22:05

Akrion · Accepted Answer · 2018-08-03T02:21:07.867

10

If you match MongoDB object language property to the exact name of the ES language analyzers all you would be needing than as per the recommended by Elastic way you would just add:

{
  "mappings": {
    "article": {
      "properties": {
        "created": {
          "type": "date"
        },
        "title": {
          "type": "text",
          "fields": {
            "english": {
              "type": "text",
              "analyzer": "english"
            },
            "dutch": {
              "type": "text",
              "analyzer": "dutch"
            },
            "bulgarian": {
              "type": "text",
              "analyzer": "bulgarian"
            }
          }
        }
      }
    }
  }

This way you have nice match on the language/analyzer field between MongoDB and ES.

edited Aug 03 '18 at 02:21

answered Aug 03 '18 at 02:15

Akrion

18,117
1
34
54

Not sure it works that way since you cannot index different data in sub-fields, i.e. you cannot index english text in `title.english` and dutch text in `title.dutch`. You simply index one string in `title` and then each sub-field gets analyzed differently, but that's probably not what the OP wants. – Val Aug 03 '18 at 12:35
The problem is, we already have the production database setup like this, so we can't change the schema of our database. So this answer is not helping, although I appreciate the answer – randomKek Aug 07 '18 at 00:12

Multi-language elastic search mapping setup

1 Answers1