5

I'm indexing PDFs with Solr using the ExtractingRequestHandler. I would like to display the page number along with hits in a document, e.g. "term foo was found in bar.pdf on pages 2, 3 and 5."

Is it possible to include page numbers in the query result like this?

javanna
  • 59,145
  • 14
  • 144
  • 125
Daniel Hepper
  • 28,981
  • 10
  • 72
  • 75

1 Answers1

5

It would require some development effort, but you could achieve this by indexing each page of each document as a seperate Solr document, and then use field collapsing to group the different page hits for each document.

Note that you need a nightly for this, field collapsing is not implemented in any currently released Solr version.

Also note: Field Collapsing is implemented in version Solr 3.3. More updates are expected in the next big version ( Solr 4.0)

Nick Craver
  • 623,446
  • 136
  • 1,297
  • 1,155
Karl Johansson
  • 1,731
  • 1
  • 13
  • 14
  • 1
    At this moment maybe there are new solutions to this problem? – zygimantus Jan 19 '17 at 11:52
  • 1
    @zygimantus I've checked out a few SOLR tickets on JIRA that were 10 years old. It's pretty safe to say that no. The suggested way is as described in this answer. Other ways would also be possible, but will take longer/be harder as you'd have to customize Solr itself. – Howie Jan 30 '18 at 11:36