5

What I want to achieve:

document: "one two three four"

search strings:

  • "one four" (must match)
  • "four one" (must not match)

What I've learned this far:

For order to be accounted for, the span_near query should be used, but this assumes that the terms are already analyzed by the client (all terms must be supplied separately).

To have the search string analyzed, the phrase_match query should be used, but it does not take order into account.

It's likely a script should be used (thanks @ChintanShah25), but it seems impossible to analyse the input string inside the script.

How to achieve both analysis and order requirement?

i_love_nachos
  • 411
  • 1
  • 4
  • 14

1 Answers1

4

There is no straightforward way to achieve this, you could do this with either using _analyze endpoint with span query or with script and match_phrase

1) You pass your search string to _analyze with

curl -XGET 'localhost:9200/_analyze' -d '
{
  "analyzer" : "my_custom_analyzer",
  "text" : "one four"
}'

you will get something like this

{
   "tokens": [
      {
         "token": "one",
         "start_offset": 0,
         "end_offset": 3,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "four",
         "start_offset": 4,
         "end_offset": 8,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]
}

you then pass the tokens to the span query

{
    "span_near" : {
        "clauses" : [
            { "span_term" : { "field" : "token1" } },
            { "span_term" : { "field" : "token2" } }
        ],
        "slop" : 2,
        "in_order" : true,
        "collect_payloads" : false
    }
}

2) Another way is to use advanced scripting, have a look at the answer of @Andrei Stefan for this question, He used _POSITIONS with match_phrase to get back results with terms in order.

Hope this helps!

Community
  • 1
  • 1
ChintanShah25
  • 12,366
  • 3
  • 43
  • 44
  • Thanks for the answer. As far as I can tell, @AndreiStefan's solution requires the search string to be processed (tokenised) on the client's side, which is bad news. I guess I am going to have to go with the double-request solution then. Could I write a script that analyses a search string and uses the resulting tokens to do what Andrei Stefan's script does? – i_love_nachos Dec 24 '15 at 14:11
  • @AndreiStefan's solution is a single request solution, you just pass all terms to script and compare their positions, only downside to that solution is that `script` can affect performance – ChintanShah25 Dec 27 '15 at 02:13