3

I'm new to elasticsearch. I want to implement functionality of span near that also takes care of substring match after exact phrase match and exact word sequence match.

For example:

documents I have on index:

  1. men's cream
  2. men's wrinkle cream
  3. men's advanced wrinkle cream
  4. women's cream
  5. women's wrinkle cream
  6. women's advanced wrinkle cream

If I search for "men's cream", I want result in the same sequence as shown above. Expected search result:

  1. men's cream --> exact phrase match
  2. men's wrinkle cream --> search term sequence with slop 1
  3. men's advanced wrinkle cream --> search term sequence with slop 2
  4. women's cream --> substring near to exact phrase match
  5. women's wrinkle cream --> substring search term sequence with slop 1
  6. women's advanced wrinkle cream --> substring search term sequence with slop 2

I can achieve first 3 results with span_near having nested span_terms with slop = 2 and in_order = true.
I'm not able to achieve it for remaining 4 to 6 because, span_near is having nested span_terms does not support wildcard, in this example "men's cream" OR "men's cream". Is there any way I can achieve it using ELASTICSEARCH?

UPDATES
My index:

{
  "bluray": {
    "settings": {
      "index": {
        "uuid": "4jofvNfuQdqbhfaF2ibyhQ",
        "number_of_replicas": "1",
        "number_of_shards": "5",
        "version": {
          "created": "1000199"
        }
      }
    }
  }
}

Mapping:

{
  "bluray": {
    "mappings": {
      "movies": {
        "properties": {
          "genre": {
            "type": "string"
          }
        }
      }
    }
  }
}

I'm running following query:

POST /bluray/movies/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "span_near": {
            "clauses": [
              {
                "span_term": {
                  "genre": "women"
                }
              },
              {
                "span_term": {
                  "genre": "cream"
                }
              }
            ],
            "collect_payloads": false,
            "slop": 12,
            "in_order": true
          }
        },
        {
          "custom_boost_factor": {
            "query": {
              "match_phrase": {
                "genre": "women cream"
              }
            },
            "boost_factor": 4.1
          }
        },
        {
          "match": {
            "genre": {
              "query": "women cream",
              "analyzer": "standard",
              "minimum_should_match": "99%"
            }
          }
        }
      ]
    }
  }
}

It is giving me following result:

"took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 6,
      "max_score": 0.011612939,
      "hits": [
         {
            "_index": "bluray",
            "_type": "movies",
            "_id": "u9aNkZAoR86uAiW9SX8szQ",
            "_score": 0.011612939,
            "_source": {
               "genre": "men's cream"
            }
         },
         {
            "_index": "bluray",
            "_type": "movies",
            "_id": "cpTyKrL6TWuJkXvliibVBQ",
            "_score": 0.009290351,
            "_source": {
               "genre": "men's wrinkle cream"
            }
         },
         {
            "_index": "bluray",
            "_type": "movies",
            "_id": "rn_SFvD4QBO6TJQJNuOh5A",
            "_score": 0.009290351,
            "_source": {
               "genre": "men's advanced wrinkle cream"
            }
         },
         {
            "_index": "bluray",
            "_type": "movies",
            "_id": "9a31_bRpR2WfWh_4fgsi_g",
            "_score": 0.004618556,
            "_source": {
               "genre": "women's cream"
            }
         },
         {
            "_index": "bluray",
            "_type": "movies",
            "_id": "q-DoBBl2RsON_qwLRSoh9Q",
            "_score": 0.0036948444,
            "_source": {
               "genre": "women's advanced wrinkle cream"
            }
         },
         {
            "_index": "bluray",
            "_type": "movies",
            "_id": "TxzCP8B_Q8epXtIcfgEw3Q",
            "_score": 0.0036948444,
            "_source": {
               "genre": "women's wrinkle cream"
            }
         }
      ]
   }
}

Which is not correct at all. Why would it search for men first when I have searched for women.

Note: searching for "men's cream" is still returning better results but not following search term sequence.

Saeed Zhiany
  • 2,051
  • 9
  • 30
  • 41
Kruti Shukla
  • 723
  • 1
  • 6
  • 7
  • I tried applying indexes explained here: http://stackoverflow.com/questions/9421358/filename-search-with-elasticsearch, but still is not returning me substring results in search term order. I also used gist provided here --> http://sense.qbox.io/gist/db82c3fca956c8bffae19559b1fe3108c101e851 that is also not giving the results that I want it to be. – Kruti Shukla Apr 10 '14 at 13:14

1 Answers1

1
POST /bluray/movies/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "span_near": {
            "clauses": [
              {
                "span_term": {
                  "genre": "women's"
                }
              },
              {
                "span_term": {
                  "genre": "cream"
                }
              }
            ],
            "collect_payloads": false,
            "slop": 12,
            "in_order": true
          }
        },{
          "match": {
            "genre": {
              "query": "women's cream",
              "analyzer": "standard",
              "minimum_should_match": "99%"
            }
          }
        }
      ]
    }
  }
}

Which give the following output as your expected:

    {
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0.7841132,
    "hits": [
      {
        "_index": "bluray",
        "_type": "movies",
        "_id": "4",
        "_score": 0.7841132,
        "_source": {
          "genre": "women's cream"
        }
      },
      {
        "_index": "bluray",
        "_type": "movies",
        "_id": "5",
        "_score": 0.56961054,
        "_source": {
          "genre": "women's wrinkle cream"
        }
      },
      {
        "_index": "bluray",
        "_type": "movies",
        "_id": "6",
        "_score": 0.35892165,
        "_source": {
          "genre": "women's advanced wrinkle cream"
        }
      },
      {
        "_index": "bluray",
        "_type": "movies",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "genre": "men's advanced wrinkle cream"
        }
      },
      {
        "_index": "bluray",
        "_type": "movies",
        "_id": "1",
        "_score": 0.25811607,
        "_source": {
          "genre": "men's cream"
        }
      },
      {
        "_index": "bluray",
        "_type": "movies",
        "_id": "2",
        "_score": 0.11750762,
        "_source": {
          "genre": "men's wrinkle cream"
        }
      }
    ]
  }
}
Rafiqul Islam
  • 1,636
  • 1
  • 12
  • 25