While working on one of the user's queries, where initially I assumed he was using the latest version and when he showed analyze API, it was surprising.
Custom analyzer for which tokens needs to be checked
{
"settings": {
"analysis": {
"filter": {
"splcharfilter": {
"type": "pattern_capture",
"preserve_original": true,
"patterns": [
"([?/])"
]
}
},
"analyzer": {
"splcharanalyzer": {
"tokenizer": "keyword",
"filter": [
"splcharfilter",
"lowercase"
]
}
}
}
}
}
Analyze API
POST /_analyze
{
"analyzer": "splcharanalyzer",
"text" : "And/or"
}
Output
{
"tokens": [
{
"token": "analyzer", --> why this token
"start_offset": 7,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "splcharanalyzer", --> why this token
"start_offset": 19,
"end_offset": 34,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "text", --> why this token
"start_offset": 42,
"end_offset": 46,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "and",
"start_offset": 51,
"end_offset": 54,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "or",
"start_offset": 58,
"end_offset": 60,
"type": "<ALPHANUM>",
"position": 5
}
]
}
As its clearly shown above its generating so many tokens which are not correct, when checked user mentioned he was using 1.7 version and followed the syntax provided in the latest version of elasticsearch.