When you create a sample doc and let ES auto-generate the mapping for you,
POST comments/_doc
{
"comment": "?"
}
running
GET comments/_mapping
will get you
"comment":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
Now, the text
type's analyzer is usually standard
by default.
When we attempt to see how our non-standard chars got analyzed
GET comments/_analyze
{
"text": "?",
"analyzer": "standard"
}
the result is
{
"tokens" : [ ]
}
meaning we cannot search for its contents using the standard-analyzed text
field but need to
either define a different default analyzer
or define this analyzer in one of the comment's fields
Going with the 2nd approach (since it's good practice to keep differently-analyzed fields separate),
PUT comments2
{
"mappings": {
"properties": {
"comment": {
"type": "text",
"fields": {
"whitespace_analyzed": {
"type": "text",
"analyzer": "whitespace"
}
}
}
}
}
}
POST comments2/_doc
{
"comment": "?"
}
After verifying
GET comments2/_analyze
{
"text": "?",
"analyzer": "whitespace"
}
we can do the following in KQL
comment.whitespace_analyzed:"?"
Note that there are a bunch of built-in analyzers to choose from but you're more than welcome to create your own.