2

I have followed the spell check example from the documentation of Solr.

The configs I have used:

<!-- a spellchecker built from a field of the main index -->
<lst name="spellchecker">
  <str name="name">default</str>
  <str name="field">name_spell</str>
  <str name="classname">solr.DirectSolrSpellChecker</str>
  <!-- the spellcheck distance measure used, the default is the internal levenshtein -->
  <str name="distanceMeasure">internal</str>
  <!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
  <float name="accuracy">0.5</float>
  <!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
  <int name="maxEdits">2</int>
  <!-- the minimum shared prefix when enumerating terms -->
  <int name="minPrefix">1</int>
  <!-- maximum number of inspections per result. -->
  <int name="maxInspections">5</int>
  <!-- minimum length of a query term to be considered for correction -->
  <int name="minQueryLength">4</int>
  <!-- maximum threshold of documents a query term can appear to be considered for correction -->
  <float name="maxQueryFrequency">0.01</float>
  <!-- uncomment this to require suggestions to occur in 1% of the documents -->
    <!-- <float name="thresholdTokenFrequency">.01</float> -->

</lst>
<lst name="spellchecker">
  <str name="name">wordbreak</str>
  <str name="classname">solr.WordBreakSolrSpellChecker</str>      
  <str name="field">name_spell</str>
  <str name="combineWords">true</str>
  <str name="breakWords">true</str>
  <int name="maxChanges">10</int>
</lst>
</searchComponent>

Handler:

  <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <str name="spellcheck.dictionary">default</str>
      <str name="spellcheck.dictionary">wordbreak</str>
      <str name="spellcheck">on</str>
      <str name="spellcheck.extendedResults">true</str>       
      <str name="spellcheck.count">10</str>
      <str name="spellcheck.alternativeTermCount">5</str>
      <str name="spellcheck.maxResultsForSuggest">5</str>       
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.collateExtendedResults">true</str>  
      <str name="spellcheck.maxCollationTries">10</str>
      <str name="spellcheck.maxCollations">5</str>         
    </lst>
    <arr name="last-components">
      <str>spellcheck_new</str>
    </arr>
  </requestHandler>

Schema Fields:

    <field name="attribute_key" type="text" indexed="true" stored="true" multiValued="false" />
    <field name="spell_check_field" type="text_spell" indexed="true" stored="false" multiValued="true"/>
    <copyField source="attribute_key" dest="spell_check_field" />
    <field name="name_spell" type="text_general" indexed="true" stored="false" multiValued="false"/>
    <copyField source="attribute_key" dest="name_spell" />
    <field name="attribute_key_tag" type="tag" stored="false" omitTermFreqAndPositions="true" omitNorms="true" multiValued="true"/>
    <copyField source="attribute_key" dest="attribute_key_tag" multiValued="true"/>
    <field name="attribute_value" type="string" indexed="false" stored="true" multiValued="false" />
    <defaultSearchField>attribute_key</defaultSearchField>

I see the suggestions working perfectly. But the collations array is always empty for all the queries.

Ex Query:

http://localhost:8984/solr/spell_check/spell?spellcheck.q=nike%20shoes&spellcheck=true&spellcheck.collate=true&wt=json&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true

Results:

{
"responseHeader": {
"zkConnected": true,
"status": 0,
"QTime": 60
},
"response": {
"numFound": 0,
"start": 0,
"docs": []
},
"spellcheck": {
"suggestions": [
"nike",
{
"numFound": 6,
"startOffset": 0,
"endOffset": 4,
"origFreq": 2,
"suggestion": [
{
"word": "n i k e",
"freq": 19
},
{
"word": "nine",
"freq": 1
},
{
"word": "none",
"freq": 29
},
{
"word": "note",
"freq": 5
},
{
"word": "nicka",
"freq": 2
},
{
"word": "nino",
"freq": 2
}
]
},
"shoes",
{
"numFound": 10,
"startOffset": 5,
"endOffset": 10,
"origFreq": 0,
"suggestion": [
{
"word": "shoe",
"freq": 30
},
{
"word": "shoe s",
"freq": 30
},
{
"word": "short",
"freq": 30
},
{
"word": "s h o e s",
"freq": 4
},
{
"word": "sheer",
"freq": 15
},
{
"word": "sheen",
"freq": 4
},
{
"word": "sheet",
"freq": 3
},
{
"word": "shower",
"freq": 2
},
{
"word": "shock",
"freq": 1
},
{
"word": "shred",
"freq": 1
}
]
}
],
"correctlySpelled": false,
"collations": []
}
}

How to set the collations on?

starkk92
  • 5,754
  • 9
  • 43
  • 59
  • Have you solved this, I am also facing the same. collations are always empty and correctlySpelled always false. – userab Jul 21 '17 at 08:47

2 Answers2

0

Let's take a look first at the definition in the documentation for SpellCheck Collate

Causes Solr to build a new query based on the best suggestion for each term in the submitted query.

Long story short, when you specify spellcheck.collate=true what happens is that you are asking Solr to recommend a new query that you could reexecute and will be better than the combination of the suggestions that you receive. Let me show you with a couple of examples.

  • Let's say that you want to search for

initial audit

  • And for whatever reason, it was typed as

initila audti

  • With collate false you would get back the following spellcheck recommendations

    <lst name="suggestions">
        <lst name="initila">
            <int name="numFound">5</int>
            <int name="startOffset">1</int>
            <int name="endOffset">8</int>
            <arr name="suggestion">
                <str>initial</str>
                <str>initi la</str>
                <str>initiala</str>
                <str>ini tila</str>
                <str>initilal</str>
            </arr>
        </lst>
        <lst name="audt">
            <int name="numFound">4</int>
            <int name="startOffset">9</int>
            <int name="endOffset">13</int>
            <arr name="suggestion">
                <str>aud t</str>
                <str>audit</str>
                <str>au dt</str>
                <str>audi</str>
            </arr>
        </lst>
    </lst>

Which means you would have several recommendations per word

  • But if you turn on collation you will most likely - if there is one - a recommendation of what is the query that should be executed. It is not guaranteed to be the best though, think of it as one good guess that can help you

    <lst name="suggestions">
        <lst name="initila">
            <int name="numFound">5</int>
            <int name="startOffset">1</int>
            <int name="endOffset">8</int>
            <arr name="suggestion">
                <str>initial</str>
                <str>initi la</str>
                <str>initiala</str>
                <str>ini tila</str>
                <str>initilal</str>
            </arr>
        </lst>
        <lst name="audti">
            <int name="numFound">5</int>
            <int name="startOffset">9</int>
            <int name="endOffset">14</int>
            <arr name="suggestion">
                <str>audit</str>
                <str>audt i</str>
                <str>auditi</str>
                <str>au dti</str>
                <str>audtis</str>
            </arr>
        </lst>
        <lst name="collation">
            <str name="collationQuery">initial audit</str>
            <int name="hits">1983</int>
            <lst name="misspellingsAndCorrections">
                <str name="initila">initial</str>
                <str name="audti">audit</str>
            </lst>
        </lst>
    </lst>
    

And this would be the recommended query

initial audit

Which is obtained from here

<str name="collationQuery">initial audit</str>

And collations only work if there is a recommended query in your index that will satisfy what you are looking for

xmorera
  • 1,933
  • 3
  • 20
  • 35
  • You have explained how collections work but can you please also look into the issue i.e. 'But the collations array is always empty for all the queries'. Why the collations array is always empty. – userab Jul 21 '17 at 08:50
  • A possibility is that the dictionary has not been built but more likely that the term being sought does not yet get to the threshold required for a suggestion to be returned. Check out this other post: https://stackoverflow.com/questions/6653186/solr-suggester-not-returning-any-results – xmorera Jul 25 '17 at 04:33
  • I have build the dictionary and the threshold is also less.You may check the other answer by me. Collaction works with q and not spellcheck.q when the default field is not specified. Why the behaviour is like that, not sure. – userab Jul 25 '17 at 06:49
0

Following approaches solved my problem:

  1. Under the requestHandler add default field as a child of defaults list i.e. <str name="df">name_spell</str> . Now executing your query would give collations results. Here any of q or spellcheck.q may be used.

OR

  1. Use q instead of spellcheck.q and while using q specify the field i.e. instead of spellcheck.q=nike%20shoes use q=name_spell:(nike%20shoes) and it would give the collations results.
userab
  • 503
  • 5
  • 10