0

i'm trying to get the frequency list of words from indexer command line tool and get it with the words unstemmed, although i set the morphology = stem_en in index settings and search itself works fine on words with same stem. Is there a way to get that list with the stemmed words?

lompy
  • 341
  • 1
  • 3
  • 12

1 Answers1

0

The only way I can think is to take the output of indexer, and then run it though teh BuildKeywords API, to get the stemmed counts. Can put thousends of keywords in one API call, so its quite lightweight.

barryhunter
  • 20,886
  • 3
  • 30
  • 43
  • can you specify what should be done according to your suggestion. I'm using thinking_sphinx wrapper with rails. Does it mean i have to use Java or PHP to create that list? Because i didn't find any references about ruby library. – lompy Oct 29 '12 at 13:35
  • sorry, found it http://rubydoc.info/github/kpumuk/sphinx/Sphinx/Client:BuildKeywords. anyways, i'm still confused about how to get the indexer output to this method. – lompy Oct 29 '12 at 13:41
  • Can run indexer via a system call http://stackoverflow.com/questions/690151/getting-output-of-system-calls-in-ruby - write them to a temporally file, and then read that. – barryhunter Oct 29 '12 at 20:54