I'm using ES with my node server via the package "elasticsearch": "12.1.3"
.
I do bulk inserts of my documents. Excerpt:
var body = [];
_.each(rows, function(doc) {
body.push({
update: {
_index: 'mytest',
_type: 'mydoc',
_id: doc.id,
_retry_on_conflict: 3
}
});
body.push({
doc: doc,
doc_as_upsert: true
});
});
client.bulk({
body: body
}, ...
On demand, to individually update documents, I have this in place:
client.index({
index: 'mytest',
type: 'mydoc',
id: doc.id,
body: doc.body
}, ...);
Everything works as expected so far. Now I'm trying to add basic 'light_english' stemming. Looking at the Docs here and for the JS package here
I want certain fields in my document to be "fuzzy" matched, therefore I think stemming is the way to go?
It is not clear to me how I would set this up. Assuming I use the example settings from the link above, would this be the right way to do it:
client.cluster.putSettings({
"settings": {
"analysis": {
"filter": {
"no_stem": {
"type": "keyword_marker",
"keywords": [ "skies" ]
}
},
"analyzer": {
"my_english": {
"tokenizer": "standard",
"filter": [
"lowercase",
"no_stem",
"porter_stem"
]
}
}
}
}
});
And would this then work permanently for my two code examples above, if applied once?
Bonus question: What would be a good default analyzer plugin (or settings) I can use? My main goal is that searches for example: "Günther" would also match "gunther" and vice versa.
Might it be better to do this manually before inserting/updating documents, so that strings are lower-cased, diacritics removed etc.?