1

I'm using Solr to index journal articles. Using the out-of-the-box configuration, it indexed the text of the documents, but I'm looking to use Grobid to pull out the authors, title, affiliations, etc. I got grobid up and running as a service.

I added

<str name="tika.config">/path/to/tika-config.xml</str>

to the requestHandler for /update/extract in solrconfig.xml

The tika-config looks like:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<properties>
  <parsers>
    <parser class="org.apache.tika.parser.journal.JournalParser">
      <mime>application/pdf</mime>
    </parser>
  </parsers>
</properties>

I'm getting a ClassNotFound exception when I try to import a document, but can't figure out where to set the classpath to fix it.

Wolfgang Fahl
  • 15,016
  • 11
  • 93
  • 186
betseyb
  • 1,302
  • 2
  • 18
  • 37

1 Answers1

1

As mentioned on the Solr user's list, the latest version of Solr (6.0.0) is using a version of Tika (1.7) that predates the addition of grobid (which came in in Tika 1.11) permalink. To follow the upgrade to Tika 1.13, see SOLR-8981

Tim Allison
  • 615
  • 3
  • 10