Elasticsearch cannot find standalone reserved characters

Question

I use Kibana to execute query elastic (Query string query).

When i search a word include escapable characters (reserved characters like: '\', '+', '-', '&&', '||', '!', '(', ')', '{', '}', '[', ']', '^', '"', '~', '*', '?', ':', '/'). It will get expected result. My example use: '!'

But when i search single reserved character. I got nothing. Or:

How can i search with single reserved character?

score 3 · Accepted Answer · answered Jul 05 '21 at 16:08

TL;DR You'll need to specify an analyzer (+ a tokenizer) which ensures that special chars like `!` won't be stripped away during the ingestion phase.

In the first screenshot you've correctly tried running _analyze. Let's use it to our advantage.

See, when you don't specify any analyzer, ES will default to the standard analyzer which is, by definition, constrained by the standard tokenizer which'll strip away any special chars (except the apostrophe ' and some other chars).

Running

GET dev_application/_analyze?filter_path=tokens.token
{
  "tokenizer": "standard",
  "text": "Se, det ble grønt ! a"
}

thus yields:

["Se", "det", "ble", "grønt", "a"]

This means you'll need to use some other tokenizer which'll preserve these chars instead. There are a few built-in ones available, the simplest of which would be the whitespace tokenizer.

Running

GET _analyze?filter_path=tokens.token
{
  "tokenizer": "whitespace",
  "text": "Se, det ble grønt ! a"
}

retains the !:

["Se,", "det", "ble", "grønt", "!", "a"]

So,

1. Drop your index:

DELETE dev_application

2. Then set the mappings anew:

(I chose the multi-field approach which'll preserve the original, standard analyzer and only apply the whitespace tokenizer on the name.splitByWhitespace subfield.)

PUT dev_application
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "splitByWhitespaceAnalyzer": {
            "tokenizer": "whitespace"
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "splitByWhitespace": {
            "type": "text",
            "analyzer": "splitByWhitespaceAnalyzer"
          }
        }
      }
    }
  }
}

3. Reindex

POST dev_application/_doc
{
  "name": "Se, det ble grønt ! a"
}

4. Search freely for special chars:

GET dev_application/_search
{
  "query": {
    "query_string": {
      "default_field": "name.splitByWhitespace", 
      "query": "*\\!*",
      "default_operator": "AND"
    }
  }
}

Do note that if you leave the default_field out, it won't work because of the standard analyzer.

Indeed, you could reverse this approach, apply whitespace by default, and create a multi-field mapping for the "original" indexing strategy (-> the only config being "type": "text").

Shameless plug: I wrote a book on Elasticsearch and you may find it useful!

score -1 · Answer 2 · answered Jun 28 '21 at 08:34

-1

Standard analyzer

The standard analyzer is the default analyzer which is used if none is specified. It provides grammar based tokenization (based on the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29) and works well for most languages.

so no token is generated for hyphen. If you want to find text with hyphen, you need to look into keyword fields and use wildcard for full text match

{
  "query": {
     "query_string": {
       "query": "*\\-*"
     }
  }
}

answered Jun 28 '21 at 08:34

jaspreet chahal

8,817
2
11
29

My search function will split words to single word and search with AND condition. And the last word will use wild card Example: - My input: hello - guy => spllit "hello", "//-", "guy*" - Expect result: hello - guy nice to meet you... – Viet Dinh Jun 28 '21 at 08:40
@VietDinh I am not able to understand your query. if this is your text in document "hello - guy nice to meet you..." do you want to search for "hello - guy " and want this document returned? – jaspreet chahal Jun 29 '21 at 14:16
Yes. But, i also want to get document which content is "hello some one - abc guy nice to meet you.." will be return. – Viet Dinh Jun 30 '21 at 08:33
@VietDinh is it required that "-" should be present or a document with "hello guy" will be returned too – jaspreet chahal Jun 30 '21 at 08:36
yes, it require "-" in document! document with "hello guy" must be not return. – Viet Dinh Jun 30 '21 at 09:45
I have done sample for your solution. But it do not work. – Viet Dinh Jul 01 '21 at 07:57