2

I am trying to add a custom dictionary in stemming but found no luck.

Steps I tried:

1) I have added the following lines in /config/script/DataIngest.xml:

<dgidx id="Dgidx" host-id="ITLHost">

<args>

  .....
  <arg>--stemming-updates</arg>

  <arg>C:/Endeca/Apps/CRS/config/script/stemmingExtension.en.xml</arg>
</args>

</dgidx>

And added following lines in stemmingExtension.en.xml:

<word_forms_collection_updates>

<WORD_FORMS>

    <WORD_FORM>shuts</WORD_FORM>

    <WORD_FORM>shirts</WORD_FORM>

</WORD_FORMS>
</word_forms_collection_updates>

Ran a baseline update and then tried to search for "shuts" and expected to get "shirts" results, but not.

What's the correct way of setting up custom dictionary words in stemming?

Thanks in advance for your help.

Basavaraj

2 Answers2

0

What version of the etl salience component are you using? I remember of a similar bug in oeid 3.0 bundle, and unluckily the answer is that the component used in clover etl doesn't call the appropriate method from java's api to get the stemmed word. You can build a mockup, directly calling java api's, to see the different methods used

morepaolo
  • 631
  • 3
  • 9
0

For Endeca 3.1.2 version, try adding it to /MDEX/<version>/conf/stemming/en_word_forms_collection.xml (for English)

Example:

<WORD_FORMS_COLLECTION>
...
<WORD_FORMS>

<WORD_FORM>shuts</WORD_FORM>

<WORD_FORM>shirts</WORD_FORM>

</WORD_FORMS>

<WORD_FORMS_COLLECTION>
KrishPrabakar
  • 2,824
  • 2
  • 31
  • 44