I allow users to type Russian words in Latin letters. If user misspells Russian word in Latin letters, I want Solr spellchecker to suggest correct word in Cyrillic (Russian words in the index is in Cyrillic). However, if user misspells not a Russian word (for example a brand name), it should be corrected in Latin letters (not russian words in the index is in Latin).
For example, tilevizor smasung
should be fixed to телевизор samsung
Now I'm using the following configuration:
<fieldType name="spell_ru" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ICUTransformFilterFactory" id="Any-Cyrillic; NFD; [^\p{Alnum}] Remove" />
</analyzer>
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.LengthFilterFactory" min="3" max="256" />
</analyzer>
</fieldType>
It converts query to Cyrillic letters, so Russian words correction works. But Latin doesn't. (tilevizor
to телевизор
works, but smasung
to samsung
doesn't).
Any ideas, how can I make spellchecker to correct both Cyrillic and Latin words?