0

Is there any way to modify the Lucene default similarity scoring function to support Multi-valued fields search, i.e. for a document that has three "persons" field, there will be three different similarity scores for each name.

An example will be, indexing a paper as one document, where its authors has multiple alias,

Person 1: David Bowie, David Robert Jones, Ziggy Stardust, Thin White Duke

Person 2: David Letterman

Person 3: David Hasselhoff, David Michael Hasselhoff

When we are searching "David", can we return 3 different similarity scores, where Score(Person 2) > Score(Person 3) > Score(Person 1).

Furthermore, can we implement an Indri style MAX or AVG operator, where MAX(document)=Score(Person 2) and AVG(document)=AVG{Score(Person 2), Score(Person 3), Score(Person 1)}

Any pointers to which part of Lucene implementation can be modified will be appreciated. Thanks.

Xin Qian
  • 51
  • 1
  • 1
  • 4
  • I think this link will be helpful : https://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/package-summary.html#changingSimilarity , but changing the scoring algorithm is a bit harsh. So if you can think you can first retrieve documents based on apache's default model and then use the retrieve documents and implement what you want on the them, do that. It is much more simpler. – Alikbar Jan 04 '17 at 19:09
  • Thanks I've looked this link already. After some search, I think DisjuncMaxQuery will be a walkaround although it needs a change to differentiate each field by a unique name. – Xin Qian Jan 05 '17 at 19:35

0 Answers0