1

Given a series of documents containing text, I'd like to search for phrases and return all the matches and rank them. I know how to get lucene/solr to indicate which documents matches, and do highlighting within the document, but how do I get a ranking that includes multiple matches from the same document?

First document.  It has a single line of text.
Second document.  This text line is quite short.
This is another line containing more text and is a bit longer.

If I searched for "text line", then I'd like it to find three matches, ranked as follows:

2nd document -> ...This "text line" is quite short.
1st document -> ...It has a single "line of text".
2nd document -> ...another "line containing more text" and is...

Is this possible? How?

Chris Leishman
  • 1,777
  • 13
  • 19
  • I originally had a more complicated question, which included this, here: http://stackoverflow.com/questions/8883390/obtain-metadata-associated-with-matched-content-in-solr-lucene – Chris Leishman Jan 17 '12 at 13:40
  • Why do you want document2 twice in the results? May be you should index each line as a document... – naresh Jan 18 '12 at 09:44
  • that's what i said, every line as a document if you want matches to be lines. – milan Jan 18 '12 at 10:24
  • I want document 2 in the results twice, because it has two different matches that have different rankings. But I can't separate each line, because my sources files are a stream of text, and a search for a phrase must match over newline boundaries. – Chris Leishman Feb 22 '12 at 04:59

1 Answers1

-1

If you want to have one match per line, then make each line its own document. Don't let the term "document" be confused with whether the text is actually a single file.

If you want to maintain a link back to the file, just index the id as well in a different (stored) field.

{ id: "myfile.txt",
  text: "first line" }

{ id: "myfile.txt",
  text: "second line" }
Xodarap
  • 11,581
  • 11
  • 56
  • 94
  • I'm not really talking about files - I'm talking about lucene documents. – Chris Leishman Feb 22 '12 at 04:54
  • The reason why making each line it's own document doesn't work, is because I actually want to be able to search phrases that could span multiple lines. If each line is a separate lucene document, that isn't possible. – Chris Leishman Feb 22 '12 at 04:55