Nest update index settings

Question

I am following the post Creating an index Nest and trying to update my index settings. All runs fine however the html_strip filter is not stripping HTML. My code is

var node = new Uri(_url + ":" + _port);
var settings = new ConnectionSettings(node);
settings.SetDefaultIndex(index);
_client = new ElasticClient(settings);

//to apply filters during indexing use folding to remove diacritics and html strip to remove html
_client.UpdateSettings(
        f = > f.Analysis(descriptor = > descriptor
                .Analyzers(
                        bases = > bases
                        .Add("folded_word", new CustomAnalyzer
                        {
                        Filter = new List < string > { "icu_folding", "trim" },
                                Tokenizer = "standard"
                        }
                        )
                        )
                .CharFilters(
                        cf = > cf.Add("html_strip", new HtmlStripCharFilter())
                        )
                )
        );

score 2 · Accepted Answer · edited Oct 05 '18 at 11:32

2

You are getting error:

Can't update non dynamic settings[[index.analysis.analyzer.folded_word.filter.0, index.analysis.char_filter.html_strip.type, index.analysis.analyzer.folded_word.filter.1, index.analysis.analyzer.folded_word.type, index.analysis.analyzer.folded_word.tokenizer]] for open indices[[my_index]]

Before you will try to update settings, close index first, update settings and reopen afterwards. Have a look.

client.CloseIndex(..);

client.UpdateSettings(..);

client.OpenIndex(..);

UPDATE

Add html_strip char filter to you custom analyzer:

.Analysis(descriptor => descriptor
                    .Analyzers(bases => bases.Add("folded_word",
                        new CustomAnalyzer
                        {
                            Filter = new List<string> { "icu_folding", "trim" }, 
                            Tokenizer = "standard", 
                            CharFilter = new List<string> { "html_strip" }
                        }))
                )

Now you can run test to check if this analyzer returns correct tokens:

client.Analyze(a => a.Index(indexName).Text("this <a> is a test <div>").Analyzer("folded_word"));

Output:

this
is
a
test

Hope it helps.

edited Oct 05 '18 at 11:32

Frederik Struck-Schøning

12,981
8
59
68

answered Jun 19 '15 at 11:13

Rob

9,664
3
41
43

Rob, many thanks your suggestion worked I can see the filter however during indexing html is not being stripped. – Ismail Jun 19 '15 at 13:53
@Ismail may you share index mapping? – Rob Jun 19 '15 at 13:54
`{ umbracotest: { settings: { index: { uuid: "eb3hMpFrS8qyb3DxHZ4_eg", analysis: { char_filter: { html_strip: { type: "html_strip" } } }, number_of_replicas: "1", number_of_shards: "5", version: { created: "1020099" } } } } }` – Ismail Jun 19 '15 at 13:57
Rob, your update and testing with Analyze works many thanks. However when I index the html is still there. When doing Index do you have to pass in which analyser to use? I am assuming it infers from what is set during client init? – Ismail Jun 22 '15 at 11:39
1

@Ismail I think I understand your concerns right now. Your content with html tags has been indexed using folded_word analyzer, but what you are getting is the original content not indexed tokens. Hope it's clear enough. [Here](https://www.elastic.co/guide/en/elasticsearch/guide/current/analysis-intro.html) you can find more info how elasticsearch works under the hood. – Rob Jun 22 '15 at 11:53
[This](http://stackoverflow.com/questions/15299799/elasticsearch-impact-of-setting-a-not-analyzed-field-as-storeyes/15320692#15320692) one should be quite useful too. – Rob Jun 22 '15 at 12:02
Rob,ah makes sense I was querying with sense and still seeing the html. Once again many thanks for your help – Ismail Jun 22 '15 at 12:13

Nest update index settings

1 Answers1