7

Im running Solr 1.4 on Ubuntu 10.04 (installed via apt-get solr-tomcat) and it seems to be working fine. Im having some difficulty finding any coherent info on how to index documents though. Im new to SOLR so bear with me! I have a folder (/mnt/folder) that is a mounted windows share, which contains Word and PDF files that I would like indexed, whats the easiest way to get SOLR to index the entire folder?

The documentation for SOLR is pretty poor, its impossilbe to find any decent tutorials on getting things done with it so any help is greatly appreciated!

S

javanna
  • 59,145
  • 14
  • 144
  • 125
Shane
  • 71
  • 1
  • 1
  • 2

3 Answers3

7

Take a look at the Solr wiki, it's a pretty thorough documentation.

In particular see the ExtractingRequestHandler, which allows you to index binary files like Word and PDF documents. Here's an introduction to the topic.

If the wiki isn't enough for you, there's also a great book about Solr.

Mauricio Scheffer
  • 98,863
  • 23
  • 192
  • 275
  • 1
    Lucid link is not working. The video is found on youtube however. http://www.youtube.com/watch?v=ifgFjAeTOws&list=PLsj1Ri57ZE94lISrJuy7W8COc2RNFC1Fl&index=14 – Avec Mar 07 '14 at 07:40
  • The only documentation I have found that is truly useful is the PDF at http://lucene.apache.org/solr/resources.html#documentation – HeadCode Jul 23 '19 at 21:04
0

I have found the same challenges with the core documentation, but I came across this very useful reference guide from LucidImagination, which helped to clarify a lot of thing about SOLR:

http://docs.lucidworks.com/display/solr/Apache+Solr+Reference+Guide

grunk
  • 14,718
  • 15
  • 67
  • 108
Jay Hung
  • 9
  • 3
  • I think this would substitute the above: http://docs.lucidworks.com/display/solr/Apache+Solr+Reference+Guide – panza Aug 09 '13 at 10:47
  • @paranza That is the same link and it no longer goes anywhere useful. – HeadCode Jul 23 '19 at 20:47
  • @HeadCode it is the same link because the original post was edited around the same time I actually wrote my reply. Lucidworks has some Solr reference for Fusion Server here: https://doc.lucidworks.com/fusion-server/4.2/solr-reference-guide/7.5.0/index.html – panza Jul 24 '19 at 21:10
0

Processing rich documents with Solr: http://wiki.apache.org/solr/UpdateRichDocuments

disco crazy
  • 31,313
  • 12
  • 80
  • 83
  • 2
    Oh, just recognized, that this method has been replaced by the ExtractingRequestHandler, like Mauricio suggested. (quote from solr wiki: _This page covers the RichDocumentHandler as created by Eric Pugh and Chris Harris. Solr's Tika integration, which will replace the RichDocumentHandler is described at ExtractingRequestHandler. This page is being preserved here for those users who currently use the RichDocumentHandler_) – disco crazy Aug 26 '11 at 08:21