1

I am trying to use Tika in python to parse PDF files. I am using python 2.7 and a Mac. I cannot get it to work. I have installed it, then:

from tika import parser
raw = parser.from_file('...file')

I get this error (edited for brevity):

Retrieving http://search.maven.org/remotecontent ... to /var/folders/... [MainThread  ] [INFO ]  Retrieving http:// ... [MainThread  ] [WARNI]  Failed to see startup log message; retrying...
...
2019-04-08 14:53:05,910 [MainThread  ] [ERROR]  Tika startup log message not received after 3 tries.
2019-04-08 14:53:05,916 [MainThread  ] [ERROR]  Failed to receive startup confirmation from startServer.

My question is very similar to that here Use tika with python, runtimeerror: unable to start tika server. The top answer, though, doesn't work for me. I have installed Java 8, but it still doesn't work. What should I do?

bill999
  • 2,147
  • 8
  • 51
  • 103
  • 1
    If you grab the Tika App runnable jar manually, and try to run that directly (eg `java -jar apache-tika-1.20.jar`), does that work fine? – Gagravarr Apr 08 '19 at 22:24
  • 1
    I might be doing things wrong. I went to https://tika.apache.org/download.html and downloaded tika-server-1.20.jar. I then ran `java - jar 'filepath to tika-server-1.20.jar'`. I got this error: Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/tika/server/TikaServerCli : Unsupported major.minor version 52.0. I did the same thing and got a similar error with `tika-app`1.20.jar` (Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/tika/cli/TikaCLI : Unsupported major.minor version 52.0). – bill999 Apr 09 '19 at 12:48
  • That means your version of Java is too old. Upgrade! Apache Tika needs Java 8+ – Gagravarr Apr 09 '19 at 14:05
  • I thought I had upgraded (I did so yesterday). When I go to Java Control Panel, it is Version 8 Update 201 (build 1.8.0_201-b09). But when I go to Terminal and do `java -version`, it says `java version "1.6.0_65"`. What to do? – bill999 Apr 09 '19 at 14:14
  • Uninstall the old version of Java 6? Helping you with Java on Windows isn't really a Tika problem though, so you really need a new questions! – Gagravarr Apr 09 '19 at 14:16
  • Thanks! I did so (and installed the Java JDK), and everything works now. – bill999 Apr 09 '19 at 14:45

1 Answers1

2

Not sure you still have problem with this - or for anyone else coming here. Even though you installed Java 8 (from Oracle or so), the terminal still see the old java that comes with OSX.

You need to tell the terminal to use the new Java you have just installed. Put this into your .bash_profile

export JAVA_HOME="/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/"

else, check System Preference > Java > Java > View > Path

you can see the path for Java, copy everything up to Home/ and paste it to export JAVA_HOME=""

Restart your terminal and tike should work now

Dr. Duke
  • 51
  • 5