10

I am setting up a java project where I use pdfBox to get images out of PDF. Since I am using tika-app for my other functions, I decided to go with pdfBox present inside tika-app-1.20.jar.

I have tried including the jai-imageio-core-1.3.1.jar before,since Tika-app already comes bundled with this jar. I tried with tika-app jar alone.

The line that's throwing error

PDXObject object = resources.getXObject(cosName);

the log trace of the error:

org.apache.pdfbox.filter.MissingImageReaderException: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed
    at org.apache.pdfbox.filter.Filter.findImageReader(Filter.java:163)
    at org.apache.pdfbox.filter.JPXFilter.readJPX(JPXFilter.java:115)
    at org.apache.pdfbox.filter.JPXFilter.decode(JPXFilter.java:64)
    at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:77)
    at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:175)
    at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:163)
    at org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:236)
    at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.<init>(PDImageXObject.java:140)
    at org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:70)
    at org.apache.pdfbox.pdmodel.PDResources.getXObject(PDResources.java:426)

But I am quite sure I have jai-imageio-core in tika which turns out to be invisible when I run the code.

Santhosh
  • 431
  • 5
  • 16

2 Answers2

28

Actually, I stumbled upon this error as well but this is mentionned in the PDFBox documentation here. You need to add the following dependencies to your pom.xml :

<dependency>
    <groupId>com.github.jai-imageio</groupId>
    <artifactId>jai-imageio-core</artifactId>
    <version>1.4.0</version>
</dependency>

<dependency>
    <groupId>com.github.jai-imageio</groupId>
    <artifactId>jai-imageio-jpeg2000</artifactId>
    <version>1.3.0</version>
</dependency>

<!-- Optional for you ; just to avoid the same error with JBIG2 images -->
<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>jbig2-imageio</artifactId>
    <version>3.0.3</version>
</dependency>

If you are using Gradle :

dependencies {
    implementation 'com.github.jai-imageio:jai-imageio-core:1.4.0'
    implementation 'com.github.jai-imageio:jai-imageio-jpeg2000:1.3.0'

    // Optional for you ; just to avoid the same error with JBIG2 images
    implementation 'org.apache.pdfbox:jbig2-imageio:3.0.3'
}
Robin
  • 1,438
  • 2
  • 19
  • 29
  • 1
    Here is the link on Maven Central: https://search.maven.org/artifact/com.github.jai-imageio/jai-imageio-jpeg2000/1.4.0/bundle – Jonathan Hult Oct 19 '21 at 21:10
1

It happens that it requires an additional jar known as jai-imageio-jpeg2000 to support jp2k images.

Santhosh
  • 431
  • 5
  • 16