Elasticsearch setting up stemming and analyzer questions

Question

I'm using ES with my node server via the package "elasticsearch": "12.1.3". I do bulk inserts of my documents. Excerpt:

var body = [];
_.each(rows, function(doc) {
    body.push({
        update: {
            _index: 'mytest',
            _type: 'mydoc',
            _id: doc.id,
            _retry_on_conflict: 3
        }
    });
    body.push({
        doc: doc,
        doc_as_upsert: true
    });
});
client.bulk({
    body: body
}, ...

On demand, to individually update documents, I have this in place:

client.index({
    index: 'mytest',
    type: 'mydoc',
    id: doc.id,
    body: doc.body
}, ...);

Everything works as expected so far. Now I'm trying to add basic 'light_english' stemming. Looking at the Docs here and for the JS package here

I want certain fields in my document to be "fuzzy" matched, therefore I think stemming is the way to go?

It is not clear to me how I would set this up. Assuming I use the example settings from the link above, would this be the right way to do it:

client.cluster.putSettings({
    "settings": {
        "analysis": {
            "filter": {
                "no_stem": {
                    "type": "keyword_marker",
                    "keywords": [ "skies" ] 
                }
            },
            "analyzer": {
                "my_english": {
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "no_stem",
                        "porter_stem"
                    ]
                }
            }
        }
    }
});

And would this then work permanently for my two code examples above, if applied once?

Bonus question: What would be a good default analyzer plugin (or settings) I can use? My main goal is that searches for example: "Günther" would also match "gunther" and vice versa.

Might it be better to do this manually before inserting/updating documents, so that strings are lower-cased, diacritics removed etc.?

After some more digging I found these two links: [Update Indices Settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html#indices-update-settings) and [Analyzers](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-analyzers.html) This might be the way to go. I will play around with it to see if I get the desired result. — bendulum, Jun 03 '17 at 15:02
Related: Substring matching https://stackoverflow.com/a/23244323/3248285 — bendulum, Jun 03 '17 at 16:50

Elasticsearch setting up stemming and analyzer questions

0 Answers0