I'm new to elasticsearch. I want to implement functionality of span near that also takes care of substring match after exact phrase match and exact word sequence match.
For example:
documents I have on index:
- men's cream
- men's wrinkle cream
- men's advanced wrinkle cream
- women's cream
- women's wrinkle cream
- women's advanced wrinkle cream
If I search for "men's cream", I want result in the same sequence as shown above. Expected search result:
- men's cream --> exact phrase match
- men's wrinkle cream --> search term sequence with
slop 1
- men's advanced wrinkle cream --> search term sequence with
slop 2
- women's cream --> substring near to exact phrase match
- women's wrinkle cream --> substring search term sequence with
slop 1
- women's advanced wrinkle cream --> substring search term sequence with
slop 2
I can achieve first 3 results with span_near
having nested span_terms
with slop = 2
and in_order = true
.
I'm not able to achieve it for remaining 4 to 6 because, span_near is having nested span_terms does not support wildcard
, in this example "men's cream" OR "men's cream".
Is there any way I can achieve it using ELASTICSEARCH?
UPDATES
My index:
{
"bluray": {
"settings": {
"index": {
"uuid": "4jofvNfuQdqbhfaF2ibyhQ",
"number_of_replicas": "1",
"number_of_shards": "5",
"version": {
"created": "1000199"
}
}
}
}
}
Mapping:
{
"bluray": {
"mappings": {
"movies": {
"properties": {
"genre": {
"type": "string"
}
}
}
}
}
}
I'm running following query:
POST /bluray/movies/_search
{
"query": {
"bool": {
"should": [
{
"span_near": {
"clauses": [
{
"span_term": {
"genre": "women"
}
},
{
"span_term": {
"genre": "cream"
}
}
],
"collect_payloads": false,
"slop": 12,
"in_order": true
}
},
{
"custom_boost_factor": {
"query": {
"match_phrase": {
"genre": "women cream"
}
},
"boost_factor": 4.1
}
},
{
"match": {
"genre": {
"query": "women cream",
"analyzer": "standard",
"minimum_should_match": "99%"
}
}
}
]
}
}
}
It is giving me following result:
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0.011612939,
"hits": [
{
"_index": "bluray",
"_type": "movies",
"_id": "u9aNkZAoR86uAiW9SX8szQ",
"_score": 0.011612939,
"_source": {
"genre": "men's cream"
}
},
{
"_index": "bluray",
"_type": "movies",
"_id": "cpTyKrL6TWuJkXvliibVBQ",
"_score": 0.009290351,
"_source": {
"genre": "men's wrinkle cream"
}
},
{
"_index": "bluray",
"_type": "movies",
"_id": "rn_SFvD4QBO6TJQJNuOh5A",
"_score": 0.009290351,
"_source": {
"genre": "men's advanced wrinkle cream"
}
},
{
"_index": "bluray",
"_type": "movies",
"_id": "9a31_bRpR2WfWh_4fgsi_g",
"_score": 0.004618556,
"_source": {
"genre": "women's cream"
}
},
{
"_index": "bluray",
"_type": "movies",
"_id": "q-DoBBl2RsON_qwLRSoh9Q",
"_score": 0.0036948444,
"_source": {
"genre": "women's advanced wrinkle cream"
}
},
{
"_index": "bluray",
"_type": "movies",
"_id": "TxzCP8B_Q8epXtIcfgEw3Q",
"_score": 0.0036948444,
"_source": {
"genre": "women's wrinkle cream"
}
}
]
}
}
Which is not correct at all. Why would it search for men first when I have searched for women.
Note: searching for "men's cream" is still returning better results but not following search term sequence.