5

The ultimate goal of this project is to take the jar and put it in a directory where it uses tesseract and outputs a results directory and the output txt file. I am having some issues with tesseract, though. I am working with tess4j in Java with Maven and I want to make my code into an executable jar. The project works fine as a desktop app but whenever i try to run using java -jar fileName.jar(after exporting to a jar) it gives me the error

Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory
Failed loading language 'eng'
...

I looked online and couldnt really find out how to set up tesseract for a jar and get the paths right. Now I use maven and have the Tesseract dependency in my pom file (tess4j -v 3.0) and I have the tessdata in my project.

I am fairly new to maven and jar files and have never used tesseract before, but as far as i can tell from the internet I set it up correctly.

Does anyone know how to make tess4j point to the tessdata directory in my project and have a dynamic path so i can move use it on multiple computers and places?

This is how I call Tesseract

    Tesseract instance = new Tesseract();
    instance.setDatapath("src/main/resources");
    String result = instance.doOCR(imageFile);
    String fileName = imageFile.getName().replace(".jpg", "");
    System.out.println("Parsed Image " + fileName);
    return result;

EDIT

This is how I tried to set the environment variable TESSDATA_PREFIX in my code

String dir = System.getProperty("user.dir");
System.out.println("current dir = " + dir);
ProcessBuilder pb = new ProcessBuilder("CMD", "/C", "SET");
Map<String, String> env = pb.environment();
env.put("TESSDATA_PREFIX", dir + "\\tessdata");
Process p = pb.start();

but this had no discernible effect. I still got the same error

EDIT 2

According to the error message I need to set it to the parent dir of the tessdata, I also tried this to no avail

EDIT 3

After a ton of searching and trying to fix it, I am not sure it is even possible. The doOcr method in tesseract takes in a buffered image or File, which would be alright if my images weren't dynamic so I cant really store them in the jar. Not to mention the fact that the TESSDATA_PREFIX still wont set. If anyone has any ideas i am all ears still and I will keep looking for a solution but im not sure it will work at all

Ian
  • 287
  • 4
  • 17
  • Is that might help ? http://stackoverflow.com/questions/18095708/tess4j-doesnt-use-its-tessdata-folder – Shmulik Klein Mar 22 '16 at 23:41
  • @ShmulikKlein Nope, didnt work for me. Ill add an edit with how i set the environment variables. I got the same error – Ian Mar 23 '16 at 15:20
  • So the problem is that I have the tessdata in my project hierarchy. I cant really pull this out because a system may not have it so I need to find a way to still load the tessdata while having it be executable – Ian Mar 23 '16 at 16:01

2 Answers2

1

You can invoke instance.setDatapath method to point Tesseract to the location of your tessdata folder.

http://tess4j.sourceforge.net/docs/docs-3.0/

nguyenq
  • 8,212
  • 1
  • 16
  • 16
  • yeah, i already do that. The problem is that jars dont have a "folder" – Ian Mar 24 '16 at 13:40
  • 1
    If you packaged `tessdata` in your JAR file, you'd need to extract it first to the local filesystem and set data path to that. – nguyenq Mar 24 '16 at 23:19
  • how would i go about doing that? – Ian Mar 25 '16 at 13:40
  • See http://stackoverflow.com/questions/17745788/how-to-copy-files-out-of-the-currently-running-jar or http://stackoverflow.com/questions/11472408/extracting-a-file-from-the-currently-running-jar-through-code/ – nguyenq Mar 25 '16 at 13:56
1

It randomly started working when I

  1. put the tessdata folder in the same directory as my jar

  2. changed the setDatapath to the following

    Tesseract instance = new Tesseract();
    instance.setDatapath(".");
    String result = instance.doOCR(imageFile);
    String fileName = imageFile.getName().replace(".jpg", "");
    System.out.println("Parsed Image " + fileName);
    return result;
    

and 3. I exported from eclipse by right clicking the project, selecting java -> runnable jar, then setting the option "Extract Required Libraries into Generated Jars".

(side note, the environment setting like I was doing early does not need to be in the project anymore)

I really thought I tried this but i guess something must have been wrong. I removed tessdata from my project and will have to include that wherever the jar is run. Im not really sure why it started working but im glad it did

Ian
  • 287
  • 4
  • 17