I want to perform substring/partial word match using elastic search. I want results to be returned in the perticular order. In order to explain my problem I will show you how I create my index, mappings and what are the records I use.
Creating Index and mappings:
PUT /my_index1
{
"settings": {
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3
}
},
"analyzer": {
"trigrams": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"trigrams_filter"
]
}
}
}
},
"mappings": {
"my_type1": {
"properties": {
"text": {
"type": "string",
"analyzer": "trigrams"
}
}
}
}
}
Bulk record insert:
POST /my_index1/my_type1/_bulk
{ "index": { "_id": 1 }}
{ "text": "men's shaver" }
{ "index": { "_id": 2 }}
{ "text": "men's foil shaver" }
{ "index": { "_id": 3 }}
{ "text": "men's foil advanced shaver" }
{ "index": { "_id": 4 }}
{ "text": "norelco men's foil advanced shaver" }
{ "index": { "_id": 5 }}
{ "text": "men's shavers" }
{ "index": { "_id": 6 }}
{ "text": "women's shaver" }
{ "index": { "_id": 7 }}
{ "text": "women's foil shaver" }
{ "index": { "_id": 8 }}
{ "text": "women's foil advanced shaver" }
{ "index": { "_id": 9 }}
{ "text": "norelco women's foil advanced shaver" }
{ "index": { "_id": 10 }}
{ "text": "women's shavers" }
Now, I want to perform search for "en's shaver". I'm searching using follwing query:
POST /my_index1/my_type1/_search
{
"query": {
"match": {
"text":
{ "query": "en's shaver",
"minimum_should_match": "100%"
}
}
}
}
I want results to be in following sequence:
- men's shaver --> closest match with following same search keyword order "en's shaver
- women's shaver --> closest match with following same search keyword order "en's shaver
- men's foil shaver --> increased distance by 1
- women's foil shaver --> increased distance by 1
- men's foil advanced shaver --> increased distance by 2
- women's foil advanced shaver --> increased distance by 2
- men's shavers --> substring match for "shavers"
- women's shavers --> substring match for "shavers"
I'm performing following query. It is not giving me result in the order I want:
POST /my_index1/my_type1/_search
{
"query": {
"query_string": {
"default_field": "text",
"query": "men's shaver",
"minimum_should_match": "90%"
}
}
}
Please suggest, How to achieve above result? Any suggestion will help.
*************************** UPDATE(6th may,2014) ********************************
I made some changes:
1. like using multi-field
2. using only one shard
3. use of analyze, filter and stemmers
Please see my settings below:
For index:
curl -XPUT "http://localhost:9200/my_improved_index" -d'
{
"settings": {
"analysis": {
"filter": {
"trigrams_filter": {
"type": "ngram",
"min_gram": 1,
"max_gram": 50
},
"my_stemmer" : {
"type" : "stemmer",
"name" : "minimal_english"
}
},
"analyzer": {
"trigrams": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"trigrams_filter"
]
},
"my_stemmer_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"my_stemmer"
]
}
}
}
}
}'
For mappings:
curl -XPUT "http://localhost:9200/my_improved_index/my_improved_index_type/_mapping" -d'
{
"my_improved_index_type": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name_gram": {
"type": "string",
"analyzer": "trigrams"
},
"untouched": {
"type": "string",
"index": "not_analyzed"
},
"name_stemmer":{
"type": "string",
"analyzer": "my_stemmer_analyzer"
}
}
}
}
}
}'
Available documents:
- men’s shaver
- men’s shavers
- men’s foil shaver
- men’s foils shaver
- men’s foil shavers
- men’s foils shavers
- men's foil advanced shaver
- norelco men's foil advanced shaver
Query:
curl -XPOST "http://localhost:9200/my_improved_index/my_improved_index_type/_search" -d'
{
"size": 30,
"query": {
"bool": {
"should": [
{
"match": {
"name.untouched": {
"query": "men\"s shaver",
"operator": "and",
"type": "phrase",
"boost": "10"
}
}
},
{
"match_phrase": {
"name.name_stemmer": {
"query": "men\"s shaver",
"slop": 5
}
}
}
]
}
}
}'
Returned result:
- men's shaver --> correct
- men's shavers --> correct
- men's foils shaver --> NOT correct
- norelco men's foil advanced shaver --> NOT correct
- men's foil advanced shaver --> NOT correct
- men's foil shaver --> NOT correct.
Expected result:
- men's shaver --> exact phrase match
- men's shavers --> ZERO word distance + 1 plural
- men's foil shaver --> 1 word distance
- men's foils shaver --> 1 word distance + 1 plural
- men's foil advanced shaver --> 2 word distance
- norelco men's foil advanced shaver --> 2 word distance
Why higher distance document scored higher? How to achieve this result? Is there any problem with stemmer or nGram settings?