11

Created a java application that uses Tesseract in order to convert a given image or pdf to a string format, when running it on my machine as a unit test using junit it runs great but when running the full system which is a restFul API run by tomcat that receives the image and runs Tesseract it gives me the following error:

23:22:36.511 [http-nio-9999-exec-3] ERROR net.sourceforge.tess4j.Tesseract - null java.lang.NullPointerException: null at net.sourceforge.tess4j.util.PdfUtilities.convertPdf2Png(PdfUtilities.java:107) at net.sourceforge.tess4j.util.PdfUtilities.convertPdf2Tiff(PdfUtilities.java:48) at net.sourceforge.tess4j.util.ImageIOHelper.getIIOImageList(ImageIOHelper.java:343) at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:213) at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:197) at ocr.OcrUtil.getString(OcrUtil.java:54) at com.tapd.server.api.handlers.IRSHandler.uploadIRSImage(IRSHandler.java:65) at com.tapd.server.api.WebAPIService.updateParentIrsForm(WebAPIService.java:250) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102) at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:309) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.process(Errors.java:267) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:292) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1139) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:460) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:386) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:334) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:221) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:192) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:108) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:522) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:349) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:1110) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:785) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1425) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Unknown Source) [2016-09-14 23:22:36,512] [ERROR] java.lang.NullPointerException

My guess is that the tessdata folder is not located in the right place and when packaged into a Jar and run by tomcat it is misplaced, but I couldn't figure out where it should be located and I have double checked to see that all Jars are deployed correctly.

Edit: so it appears that Tesseract can't handle the path when it is on a remote server such as AWS S3, so the question is why? and how can I allow it to use a path from S3? (yes the file is public)

Adi
  • 2,074
  • 22
  • 26

3 Answers3

5

My guess is that there is GhostscriptException which is not logged properly, and this is causing NullPointerException:

https://github.com/nguyenq/tess4j/blob/212d72bc2ec8b3a4d4f5a18f1eb01a0622fc5521/src/main/java/net/sourceforge/tess4j/util/PdfUtilities.java#L107

106        } catch (GhostscriptException e) {
107            logger.error(e.getCause().toString(), e);
108        } finally {

In line 107 - e.getCause() is (probably) null, calling null.toString() throws NPE.

(from the specs - getCause can be null: https://docs.oracle.com/javase/7/docs/api/java/lang/Throwable.html#getCause(), GhostscriptException is also allowing the cause to be null: http://grepcode.com/file/repo1.maven.org/maven2/org.ghost4j/ghost4j/1.0.0/org/ghost4j/GhostscriptException.java)

To verify this answer (without recompiling the whole tess4j) you could start your program in the debug mode and put a breakpoint at line 107. This will give you information about the real Exception.

Piotr Reszke
  • 1,576
  • 9
  • 21
  • I suggest replacing `e.getCause().toString()` by `String.valueOf(e.getCause())` in OP's code to be safe in this case. – Axel Sep 19 '16 at 07:42
  • I managed to get as far as understanding that GhostscriptException is null but the real question is why? how can I resolve it? and why when I run it locally(junit) it doesn't happen? – Adi Sep 19 '16 at 07:44
  • "understanding that GhostscriptException is null" - this is not correct. The GhostscriptException is not null, the GhostscriptException is a valid instance of the Exception. Only the ghostscriptException.getCause() is null. To address this problem start your app in debug mode and check what is the exception message - there should be more details. – Piotr Reszke Sep 19 '16 at 07:48
  • To fully resolve this issue you would have to raise a bug against tess4j: https://github.com/nguyenq/tess4j/issues (and maybe send a pull request). If you need some extra guidance on how to solve this temporarily you can call me on chat – Piotr Reszke Sep 19 '16 at 08:01
  • @Axel - the problem here is that it's a third party lib (PR is required to fix this bug) – Piotr Reszke Sep 19 '16 at 08:38
  • @Adi - I've submitted an issue against the tess4j library https://github.com/nguyenq/tess4j/issues/41 – Piotr Reszke Sep 19 '16 at 19:27
  • @Piotr R - Thank you very much! – Adi Sep 20 '16 at 06:05
2

As @Piotr R mentioned the error was ghostscriptException.getCause() is null and the reason for that is that the path configured in the file object sent to Tesseract was not a valid one, now the definition of valid for Tesseract is a bit different then yours, he consider only a local address as valid, so when setting a file located on AWS S3 even if it's public it will throw an error. The solution was saving it locally and deleting it after Tesseract is done.

Adi
  • 2,074
  • 22
  • 26
  • @Piotr R I don't have the stack trace of GhostscriptException nor can I debug it as it is a external library, the way I access s3 is not relevant as It can't access files that are not stored locally, that is exactly the answer I was looking for & the solution. That been said I really appreciate your help and support, don't worry when I mark myself as the correct answer I don't get the bounty. – Adi Sep 23 '16 at 07:47
0

Resources I used: Windows 10 (tried on Windows Server 2016 as well), JAVA, MAVEN

Status: Working good on my local as well as VM

  1. Download Tess4J-3.4.8 from here http://tess4j.sourceforge.net/ and set your ENV variable path under Advance System Setting
  2. Get repo from MAVEN -
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.5.1</version>
</dependency>
<dependency>
<groupId>org.ghost4j</groupId>
<artifactId>ghost4j</artifactId>
<version>1.0.1</version>
</dependency>
<dependency>
<groupId>net.sourceforge.lept4j</groupId>
<artifactId>lept4j</artifactId>
<version>1.7.0</version>
</dependency>
  1. Get libtesseract302.dll and copy to "C:\Windows\System32" folder from here http://api.256file.com/libtesseract302.dll/en-download-56466.html do not forget to set your ENV variable path under Advance System Setting

  2. Download and install Visual C++ 2015 Redistributable or VC++ 2017 Redistributable (I installed both ) from here https://programmer.help/blogs/net.sourceforge.tess4j.tesseractexception-java.lang.nullpointerexception.html

then do restart your PC

  1. on Safer side can have some Jar files if you dont have already in local - Please see image

    do not forget to set your ENV variable path for JARs under Advance System Setting

enter image description here

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Mike ASP
  • 2,013
  • 2
  • 17
  • 24