0

I am trying to test the spellchecking functionality with Solr 4.7.2 using solr.DirectSolrSpellChecker (where you don't need to build a dedicated index).

I have a field named "title" in my index; I used a copy field definition to create a field named "title_spell" to be queried for the spellcheck (title_spell is correctly filled). However, in the admin solr admin console, I always get empty suggesions.

For example: I have a solr document with the title "A B automobile"; I enter in the admin console (spellcheck crossed and under the input field spellcheck.q) "atuomobile". I expect to get at least something like "A B automobile" or "automobile" but the spellcheck suggestion remains empty...

My configuration:

schema.xml (only relevant part copied):

    <fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
        <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.StandardFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="de_DE/synonyms.txt" ignoreCase="true"
                    expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.StandardFilterFactory"/>
        </analyzer>
    </fieldType>
    ...
    <field name="title_spell" type="textSpell" indexed="true" stored="true" multiValued="false"/>

solr.xml (only relevant part copied):

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">textSpell</str>
    <lst name="spellchecker">
        <str name="name">default</str>
        <str name="field">title_spell</str>
        <str name="classname">solr.DirectSolrSpellChecker</str>
        <str name="distanceMeasure">internal</str>
        <float name="accuracy">0.5</float>
        <int name="maxEdits">2</int>
        <int name="minPrefix">1</int>
        <int name="maxInspections">5</int>
        <int name="minQueryLength">4</int>
        <float name="maxQueryFrequency">0.01</float>
        <float name="thresholdTokenFrequency">.01</float>
    </lst>
</searchComponent>
...
<requestHandler name="standard" class="solr.SearchHandler" default="true">
    <lst name="defaults">
        <str name="defType">edismax</str>
        <str name="echoParams">explicit</str>
    </lst>
    <!--Versuch, das online datum mit in die Gewichtung zu nehmen...-->
    <lst name="appends">
        <str name="bf">recip(ms(NOW/MONTH,sort_date___d_i_s),3.16e-11,50,1)</str>
        <!--<str name="qf">title___td_i_s_gcopy^1e-11</str>-->
        <str name="qf">title___td_i_s_gcopy^21</str>
        <str name="q.op">AND</str>
    </lst>


    <arr name="last-components">
        <str>spellcheck</str>
    </arr>
</requestHandler>

What did I miss? Thanks for your answers!

R..
  • 113
  • 7

2 Answers2

2

How large is your index? For a small index (think less than a few million docs), you're going to have to tune accuracy, maxQueryFrequency, and thresholdTokenFrequency. (Actually, it would probably be worth doing this on larger indices as well.)

For example, my 1.5 million doc index uses the following for these settings:

      <float name="maxQueryFrequency">0.01</float>
      <float name="thresholdTokenFrequency">.00001</float>
      <float name="accuracy">0.5</float>

accuracy tells Solr how accurate a result needs to be before it's considered worth returning as a suggestion.

maxQueryFrequency tells Solr how frequently the term needs to occur in the index before it's can be considered worth returning as a suggestion.

thresholdTokenFrequency tells Solr what percentage of documents the term must be included in before it's considered worth returning as a suggestion.

If you plan to use spellchecking on multiple phrases, you may need to add a ShingleFilter to your title_spell field.

Another thing you might try is setting your queryAnalyzerFieldType to title_spell.

TMBT
  • 1,183
  • 10
  • 17
  • The index is really small; 1500 at the moment, won't grow over 10000 in the next years. I ve tried to change queryAnalyzerFieldType but it didn't help, i will try to tune the other parameters – R.. Aug 25 '15 at 08:54
  • You definitely need to tune those parameters, then. Your threshold and query frequency are probably going to be really, really low. – TMBT Aug 25 '15 at 13:35
0

Can you please try editing your requestHandler declaration.

<requestHandler name="/standard" class="solr.SearchHandler" default="true">

and query url as:

http://localhost:8080/solr/service/standard?q=<term>&qf=title_spell

First experiment with small terms and learn how it is behaving. One problem here is it will only return all the terms starting with the same query term. You can use FuzzyLookupFactory which will match and return fuzzy result. For more information check solr suggester wiki.

YoungHobbit
  • 13,254
  • 9
  • 50
  • 73
  • I've tried to edit the requestHandler but it didn't help. I will look into the configuration for fuzzy search – R.. Aug 25 '15 at 08:44
  • @R.. which mode are you running solr? If you are using CloudSolr mode then query like [this](http://stackoverflow.com/questions/32045700/solr-suggester-in-solrcloud-mode). – YoungHobbit Aug 26 '15 at 07:34