1

I want to convert each pdf page into an image (like a screen shot), and then upload that image to a storage service.

  private void getImageBytes(PDDocument document, int pageIndex, int dpi, ConcurrentHashMap<String, byte[]> imgsToUpload, String imgKey) throws IOException {


    PDFRenderer pdfRenderer = new PDFRenderer(document);
    BufferedImage bim = pdfRenderer.renderImageWithDPI(pageIndex, dpi, ImageType.RGB);
    final ByteArrayOutputStream os = new ByteArrayOutputStream();
    ImageIO.write(bim, "png", os); // import javax.imageio.ImageIO;
    os.flush();
    os.close();
    imgsToUpload.put(imgKey, os.toByteArray());
  }

I did import jbig2-imageio by adding this snippet to pom.xml

        <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>pdfbox</artifactId>
            <version>2.0.19</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.pdfbox/jbig2-imageio -->
        <dependency>
            <groupId>org.apache.pdfbox</groupId>
            <artifactId>jbig2-imageio</artifactId>
            <version>3.0.3</version>
        </dependency>

but still the generated images are blank. and this error is logged : Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed

So, what did i miss ? , i assumed that adding the dependency would resolve that error.

should i use pdfbox-tools.imageIo instead of java.imageio

PS: I am New to Java, so i it might be a configuration thing ??

Tilman Hausherr
  • 17,731
  • 7
  • 58
  • 97
Mu-Majid
  • 851
  • 1
  • 9
  • 16
  • Is this a straight forward stand alone program? Or something in tomcat? – Tilman Hausherr May 07 '20 at 17:40
  • @TilmanHausherr, it is a spring boot application running tomcat as a server., basically what this endpoint does is , extract all form fields from the pdf, and also each page as an image. – Mu-Majid May 07 '20 at 21:02
  • 1
    I remember reading this about tomcat... and I found that it was discussed before, but no solution: https://mail-archives.apache.org/mod_mbox/pdfbox-users/201808.mbox/browser I wonder whether it works if you put that jar file in the tomcat directory where the tomcat jar files are? – Tilman Hausherr May 08 '20 at 04:43
  • I just restarted my PC, and it worked, I really don't know what happened or what was wrong. @TilmanHausherr, but the images have black stains, i found one of your answers that is a bug with JAI, and it was solved but not released. – Mu-Majid May 08 '20 at 12:11
  • 1
    Yeah, you need to do your own build with the modifications mentioned. IIRC the main modification is in the repository. – Tilman Hausherr May 08 '20 at 13:03

1 Answers1

1

You need to add these dependencies in your pom file in order to solve the issue.

<dependency>
    <groupId>com.github.jai-imageio</groupId>
    <artifactId>jai-imageio-core</artifactId>
    <version>1.4.0</version>
</dependency>
<dependency>
    <groupId>com.github.jai-imageio</groupId>
    <artifactId>jai-imageio-jpeg2000</artifactId>
    <version>1.4.0</version>
</dependency>

https://pdfbox.apache.org/2.0/dependencies.html#optional-components

Tilman Hausherr
  • 17,731
  • 7
  • 58
  • 97