7

I can't seem to figure out how to find substring matches with SOLR, I've figured out matches based on a prefix so I can get ham to match hamburger.

How would I get a search for 'burger' to match hamburger as well? I tried burger but this tossed an error '*' or '?' not allowed as first character in WildcardQuery.

How can I match substrings using SOLR?

javanna
  • 59,145
  • 14
  • 144
  • 125
Michael
  • 3,498
  • 5
  • 27
  • 32

3 Answers3

9

If anyone ends up here after searching for "apachesolr substring", there's a simpler solution for this : https://drupal.stackexchange.com/a/27956/10419 (from https://drupal.stackexchange.com/questions/26024/how-can-i-make-search-with-a-substring-of-a-word)

Add ngramfilter to text type definition in schema.xml in solr config directory.

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="25" />
Community
  • 1
  • 1
Paul
  • 611
  • 1
  • 6
  • 7
  • This has been haunting me for weeks. Thank you for posting, it solved my issue with trying to filter/search based on substring. – Tyler Ferraro Jul 31 '15 at 05:23
  • This would not work for very large data sets. Edge gram field will need a lot of memory while indexing data. – alpeshpandya May 02 '17 at 18:27
3

You can enable this but it will be very resource hungry (e.g. search for SuffixQuery).

See: http://lucene.472066.n3.nabble.com/Leading-Wildcard-Search-td522362.html

Quoting the mailing list: Work arounds? Imagine making a second index (or adding another field) with all of the terms spelled backwards.

=>

See Add ReverseStringFilter https://issues.apache.org/jira/browse/LUCENE-1398

and Support for efficient leading wildcards search: https://issues.apache.org/jira/browse/SOLR-1321

At the moment issues.apache.org seems down. Try to use e.g. google cache.

Karussell
  • 17,085
  • 16
  • 97
  • 197
3

As stated before in link you can use leading wildcards with edismax (ExtendedDismaxQParser). Just try it out to see if it is fast enough.

Some more info about the above mentioned reversedstring can also be found here: solr.ReversedWildcardFilterFactory

Community
  • 1
  • 1
Jem
  • 551
  • 3
  • 2