13

I use Solr's proximity search quite often to search for words within a specified range of each other, like so

"Government Spending" ~2

I was wondering is there a way to perform a proximity search using a phrase and a word or two phrases. Is this possible? If so what is the syntax?

halfer
  • 19,824
  • 17
  • 99
  • 186
Ruth
  • 5,646
  • 12
  • 38
  • 45

3 Answers3

16

This appears to be "somewhat" doable. Consider this text:

This is more about traffic between Solr servers themselves 

"more traffic between solr" ~2

"more about between solr" ~2

Even if you change the order it works:

"more about solr between" ~2" ~2

But too far apart and it stops working:

"more about servers themselves" ~2

I think if that doesn't work, it would probably not be TOO hard to make a custom request handler that does this. I think you might need to define a new syntax, prehaps something like ("phrase one" "phrase two") ~2. I would guess that if you are shingling, and you create a Lucene query where there is a token of just "phrase one" and another of "phrase two" that have a certain proximity, i think it will work. (of course you will need to actually make the lucene java call, you can't just hand the query over (read this http://lucene.apache.org/java/2_2_0/api/index.html)).

mlathe
  • 2,375
  • 1
  • 23
  • 42
11

Out of the box I have discovered a way to perform a Solr proximity search using more then one word, or phrases, see below

eg. with 3 words:

"(word1) (word2) (word3)"~10

eg. with 2 phrases: (note the double quote needs to be escaped)

"(\"phrase1\") (\"phrase2\")"~10

Ruth
  • 5,646
  • 12
  • 38
  • 45
  • 1
    This didn't restrict results by phrase for me (in Solr 9.0.0). For example, searching the example "techproducts" data with `"(\"cord power\") (\"dock\")"~10` returns one hit even though the term "cord power" does not appear in the document, while "power cord" does appear. – user9712582 Jul 08 '22 at 14:41
9

Since Solr 4 it is possible with SurroundQueryParser.

E.g. to query where "phrase two" follows "phrase one" not further than 3 words after:

3W(phrase W one, phrase W two)

To query "phrase two" in proximity of 5 words of "phrase one":

5N(phrase W one, phrase W two)
Andrey
  • 6,526
  • 3
  • 39
  • 58