How to use acronyms in Apache Solr?

Question

I use text_general field of Solr's provided configuration for storing content of web-pages as follows:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

Field:

<field name="content" type="text_general" stored="true" indexed="true"/>

Say, in synonyms.txt I have an entry:

ABC=>Apple Ball Company

If I perform search on content field with q=content:ABC On my data where I do not have any content with "Apple Ball Company" together.

I get the highlighting-snippets for all words Apple, Ball and Company in my content containing those words not in same sequence nor even present together.

I want the highlighting only for the acronym ABC and/or only for the expansion "Apple Ball Company" (if these words come together in same sequence).

score 3 · Accepted Answer · answered Dec 13 '17 at 12:38

There are issues with the SynonymFilterFactory for multi-word synonyms resulting in "sausagination". It is explained very well here: https://lucidworks.com/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/ The reason is that the filter only takes into account the offset of the tokens but not the position length increment. This has been address with the SynonymGraphFilter, see https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-SynonymGraphFilter

So use the SynonymGraphFilter instead of the deprecated SynonymFilterFactory, e.g. <filter class="solr.SynonymGraphFilterFactory" synonyms="mysynonyms.txt"/>.

I am getting highlighted results of only `Apple` only `Ball` and only `Company` also iff the same record is having a sequence 'Apple Ball Company`. Thanks @drjz , it worked. — S Jayesh, Dec 15 '17 at 06:52

How to use acronyms in Apache Solr?

1 Answers1