2

I am having a problem enabling solr highlighting on some of my schema fields

For example, I have the following field types:

<fieldType name="string" class="solr.StringField" />
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="query">
    <tokenizer class="solr.ICUTokenizerFactory" />
    <filter class="solr.ICUFoldingFilterFactory" />
    <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="0" preserveOriginal="1" />
    <filter class="solr.TrimFilterFactory" />
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
  </analyzer>
  <analyzer type="index">
    <tokenizer class="solr.ICUTokenizerFactory" />
    <filter class="solr.ICUFoldingFilterFactory" />
    <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="0" preserveOriginal="1" />
    <filter class="solr.TrimFilterFactory" />
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
  </analyzer>
</fieldType>

I would like to perform text highlighting on any field of type "String" or "Text". The problem is I am unable to make solr highlight for type "String". It only works of type "Text". I do not want to make any changes to the actual text of field of type "String" but I would like solr to pick up the highlighting.

Any thoughts?

I am using solr 9 with java 17

  • 2
    Related: [Enabling solr highlighting on field](https://stackoverflow.com/q/67089477/12567365). Lucene `StringField` fields are [not tokenized](https://lucene.apache.org/core/9_4_1/core/org/apache/lucene/document/StringField.html). The entire string is indexed as a single token (contrast that with a `TextField` - which [can be tokenized](https://lucene.apache.org/core/9_4_1/core/org/apache/lucene/document/TextField.html)). – andrewJames Oct 31 '22 at 13:47
  • so how can I use TextField tokenizers to enable highlighting without effecting the actual text of the field (because its been used as a facet) ? – Farooq Alaulddin Oct 31 '22 at 13:55
  • 1
    create another field(using copy field) and make it text fields and apply the relevant tokenizers and filter and use it for highlighting. One more point field should be indexed and stored for highlighting to be possible – Abhijit Bashetti Oct 31 '22 at 14:13
  • thank you. one last question. what are the relevant tokenizers and filters? with keeping the text as its original form. – Farooq Alaulddin Oct 31 '22 at 15:34
  • 1
    The stored text will always be kept in its original form - so any highlighting will be done against that. You process the text according to _what you want to match_. If you don't want to do anything other than a 1:1 match except for lowercasing, use a WhitespaceTokenizer with a LowercaseFilter. – MatsLindh Oct 31 '22 at 20:16

1 Answers1

1

Use your string fields to do EXACT matching and facet displays. It's not a good choice for highlighting.

For highlighting, you simply have to edit the string field in the schema (or managed-schema) and create a new field of text type:

<copyField source="cat" dest="text" maxChars="30000" />

and make sure you define the field you're copying to.

If you're feeling lazy and don't care about ending your highlight fields with {_t}, solr does have a default schema setting to automatically copy over the field to a text type simply by naming it with a _t at the end:

<copyField source="cat" dest="dynamic_text_example_t" maxChars="30000" />

you can read all about copying fields here

It feels like a lot of work at first, but this is why solr is fast.