11

I have a Solr instance with a suggester component. It works fine, using the AnalyzingInfixLookupFactory implementation.

However, I want to expand the suggestions to a content field, which can contain a lot of text. The suggester finds suggestions all right, but it returns the entire field value, instead of just a sentence, or part of a sentence.

So, if I want a suggestion for "foo", and the content field contains a text like:

"I really like pizza. And donuts. Let's get some from that other place. The foo bar place."

The suggestion will be that entire text, instead of just "The foo bar place". And, obviously, when content is hundreds of words long, this is just not usabe.

Is there a way to limit the number of returned words for a suggestion?

Here's my search component:

<searchComponent name="suggest" class="solr.SuggestComponent">
  <lst name="suggester">
    <str name="name">autocomplete</str>
    <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
    <str name="indexPath">suggestions</str>
    <str name="dictionaryImpl">DocumentDictionaryFactory</str>
    <str name="field">suggest</str>
    <str name="suggestAnalyzerFieldType">text_suggest</str>
    <str name="buildOnStartup">false</str>
    <bool name="highlight">false</bool>
    <str name="payloadField">label</str>
  </lst>
</searchComponent>

And here's the request handler:

<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
  <lst name="defaults">
    <str name="suggest">true</str>
    <str name="suggest.dictionary">autocomplete</str>
    <str name="suggest.count">10</str>
  </lst>
  <arr name="components">
    <str>suggest</str>
  </arr>
</requestHandler>

Finally, here is the field from which the suggestions are derived:

<fieldType name="text_suggest" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

<field name="suggest" type="text_suggest" indexed="true" multiValued="true" stored="true"/>

I then use a bunch of <copyField>s to copy the content over.

EDIT 2015-08-28

The content field definition is as follows:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <charFilter class="solr.MappingCharFilterFactory" mapping="txt/mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="txt/stopwords.txt" />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" preserveOriginal="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    <filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="25"/>
  </analyzer>
  <analyzer type="query">
    <charFilter class="solr.MappingCharFilterFactory" mapping="txt/mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

<field name="content" type="text" indexed="true" stored="true" termVectors="true"/>

EDIT 2016-09-28

This issue is probably related: Is Solr SuggestComponent able to return shingles instead of whole field values?

Community
  • 1
  • 1
wadmiraal
  • 181
  • 8
  • 1
    How does the field type of content look like? – Uwe Allner Aug 28 '15 at 08:03
  • Updated question accordingly. – wadmiraal Aug 28 '15 at 08:07
  • can you add some sample data as well? – YoungHobbit Sep 23 '15 at 10:16
  • What do you mean by "sample data"? Is my example of _"I really like pizza. And donuts. Let's get some..."_ not enough? – wadmiraal Oct 08 '15 at 06:40
  • Do I understand correctly: you always want the phrase returned? In your example you show that the words before the 'suggestion' word are returned. What do you expect when someone will type last word in the sentence? (i.e. "place" in your example) Another quick question: could the field content be multivalued? – MikeMatusiak Nov 09 '15 at 11:45
  • I want to *limit* the amount of words returned. Currently, it returns the *entire field value*, which can be a very long text. I just want a few words, or simply a sentence. And yes, fields could be multivalued. But it is not a hard requirement at this stage. – wadmiraal Dec 07 '15 at 14:21

1 Answers1

2

I think what you might be looking for is solr.ShingleFilterFactory, which simply allows to limit the token size basing on the words count, rather than text lenght as in solr.NGramFilterFactory you've been trying to use.
Please see SOLR wiki page for more details:
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory

llesiuk
  • 21
  • 5