1

Using Lucene, I want to compare a document in the index with the rest of documents. I found out that an easy way would be to submit the document as a query. The problem is that I need to put terms as an OR-Ring and, the most difficult part, boost the terms with the term frequency.

I think that if I trim all blank spaces of the document and replace them with ' OR ', lucene will parse it and interpret it. But is there a most sophisticated way to deal with this problem?

And which is the easiest way to boost the terms with their respective frequencies?

Community
  • 1
  • 1
synack
  • 1,699
  • 3
  • 24
  • 50

1 Answers1

1

It looks like you are trying to re-implement Lucene's MoreLikeThis.

jpountz
  • 9,904
  • 1
  • 31
  • 39
  • In fact I think that I need something simpler than that. I just want to compare two documents using the tf*idf scheme, i.e., I want to get high scores if these documents share very infrequent terms. – synack Sep 25 '12 at 07:32