11

For example I have synonyms laptop,netbook,notebook in index_synonyms.txt

When user search for netbook I want to boost original text more then expanded by synonyms? Is there way to specify this in SynonymFilterFactory? For example use original term twice so his TF will be bigger

yura
  • 14,489
  • 21
  • 77
  • 126

1 Answers1

8

As far as I know, there is no way to do this with the existing SynonymFilterFactory. But following is a trick you can use to get this behavior.

Let's say your field is called title. Create another field which is a copy of this, say title_synonyms. Now ensure that SynonymFilterFactory is used as an analyzer only for title_synonyms (you can do this by using different field types for the two fields — say text and text_synonyms). Search in both these fields but give higher boost to title than title_synonyms.

Here are sample field type definitions:

    <fieldType name="text" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
    </fieldType>

    <fieldType name="text_synonyms" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms_index.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms_query.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        </analyzer>
    </fieldType>

And here are sample field definitions:

    <field name="title" type="text" stored="false"
           required="true" multiValued="true"/>
    <field name="title_synonyms" type="text_synonyms" stored="false"
           required="true" multiValued="true"/>

Copy title field to title_synonyms:

<copyField source="title" dest="title_synonyms"/>

If you are using dismax, you can give different boosts to these fields like so:

    <str name="qf">title^10 title_synonyms^1</str>
Siddhartha Reddy
  • 6,130
  • 1
  • 33
  • 20
  • Really nice idea! But in my case I have about 10 fields where synonyms required so... will do this if there are no other workarounds...solr patches etc – yura May 19 '12 at 08:21
  • 2
    If you are using the same synonyms file for all those fields, you can copy all of them into one common synonyms field — you don't need one synonyms field corresponding to each field. – Siddhartha Reddy May 19 '12 at 13:17
  • 1
    But I use fine grained weight to all fields. So synonym for title is more important than synonym for description etc. – yura May 23 '12 at 07:47
  • 1
    Multi word searches are problematic with query time synonyms. See : [SynonymFilterFactory documentation](http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory) – Th 00 mÄ s Dec 12 '13 at 13:40