1

I am using tika for extracting text from pdf in python. But, it downloads the .jar on every run. which is time consuming.

[MainThread  ] [INFO ]  Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar to /tmp/tika-server.jar.

This happens every time I run the code. Is there a way to manually do it once and stop tika to do it everytime?

Siddharth Das
  • 1,057
  • 1
  • 15
  • 33
  • Pop it wherever you want, and pass the `TIKA_SERVER_JAR` environment variable to specify it? See https://github.com/chrismattmann/tika-python#environment-variables – Gagravarr Jun 22 '19 at 07:26

1 Answers1

4

I know it´s been a while and you probably figured something out already, but for others like me still looking for solution I would like to sugest other topic in wich the guy who asks the question presentes his own functional aproach.

Moreover, I noticed that tika demands internet access only at the very first run, so, if you manage to deny internet access for it after setting everything up, it won´t waste time downloading new files.

Ajjax
  • 56
  • 4