15

I'm trying to set up some basic synonyms in Solr. The one I've been working on is:

us, usa, united states

My understanding is that adding that to the synonym file will allow users to search for US, and get back documents containing usa or united states. Ditto for if a user puts in usa or united states.

Unfortunately, with this in place, when I do a search, I get the results for items that contain all three of the words - it's doing an AND of the synonyms rather than an OR.

If I turn on debugging, this is indeed what I see (plus some stemming):

(+DisjunctionMaxQuery(((westCite:us westCite:usa westCite:unit) | (text:us text:usa text:unit) | (docketNumber:us docketNumber:usa docketNumber:unit) | ((status:us status:usa status:unit)^1.25) | (court:us court:usa court:unit) | (lexisCite:us lexisCite:usa lexisCite:unit) | ((caseNumber:us caseNumber:usa caseNumber:unit)^1.25) | ((caseName:us caseName:usa caseName:unit)^1.5))))/no_coord

Am I doing something wrong to cause this? My defaultOperator is set to AND, but I'd expect the synonym filter to understand that.

mlissner
  • 17,359
  • 18
  • 106
  • 169

3 Answers3

24

Try using the SynonymFilterFactory during indexing only, not during querying.

The documentation suggests this as well: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

rfeak
  • 8,124
  • 29
  • 28
  • Definitely worked. Glad to get this resolved *before* I index 600,000 documents in the live site... – mlissner Jan 15 '12 at 06:07
  • 1
    Please note that this recommendation is for older versions of SOLR only. [The newer SynonymGraphFilter is fine with being used at query time.](https://lucene.apache.org/solr/guide/8_8/filter-descriptions.html#synonym-graph-filter) I have seen it successfully used in production. It's much easier if you don't have to reindex after each synonym change... – Suzana Feb 05 '21 at 11:35
9

For better understanding of synonym search, Please follow step by step process of implementation below (I am using solr 6.5.* version):

Step 1:

Download country-synonyms.txt text file and place it in below path:

Path: \solr-6.5.1\server\solr\yourCore\conf

yourCore: Name of core should be changed accordingly

Step 2:

Add Field type in managed-schema file in same path mentioned above:

<fieldType name="country" class="solr.TextField" positionIncrementGap="100" sortMissingLast="true">
<analyzer>
  <tokenizer class="solr.KeywordTokenizerFactory"/>
  <filter class="solr.TrimFilterFactory"/>
  <filter class="solr.SynonymFilterFactory" expand="false" ignoreCase="true" synonyms="country-synonyms.txt" tokenizerFactory="solr.KeywordTokenizerFactory"/>
  <filter class="solr.LowerCaseFilterFactory"/>
</analyzer></fieldType>

Step 3: Add your field(Nationality) with type country in same file(managed-schema).

<field name="Nationality" type="country" indexed="true" stored="true"/>

Step 4: Restart solr.

solr restart -p <your solr port>

Step 5:

Now import your data with field containing Nationality.***

Step 6:

Now query with below cases and test:

Query:

  1. Nationality:US
  2. Nationality:USA
  3. Nationality:United States
  4. Nationality:United States of America

All above queries will give you same result.

Note:*** Import data only after performing above steps including solr restart. It may not work on existing data(For more details refer: AnalyzersTokenizersTokenFilters)

shivadarshan
  • 896
  • 2
  • 15
  • 25
2

To complete the answer from a newer Solr perspective I would like to add one thing when it comes to synonyms. Recent versions of Solr properly handle multi-word synonyms during query and index time.

To use the new synonyms implementation you would have to use a different filter, for example:

<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

Also if you are using them during index time, put the following filter at the end of your analysis chain definition:

<filter class="solr.FlattenGraphFilterFactory"/>

Hopefully, someone will find that useful :)

Rafal
  • 216
  • 1
  • 5