PDF Box generating blank images due to JBIG2 Images in it

Question

Let me give you an overview of my project first. I have a pdf which I need to convert into images(One image for one page) using PDFBox API and write all those images onto a new pdf using PDFBox API itself. Basically, converting a pdf into a pdf, which we refer to as PDF Transcoding.

For certain pdfs, which contain JBIG2 images, PDFbox implementation of convertToImage() method is failing silently without any exceptions or errors and finally, producing a PDF, but this time, just with blank content(white). The message I am getting on the console is:

Dec 06, 2013 5:15:42 PM org.apache.pdfbox.filter.JBIG2Filter decode
SEVERE: Can't find an ImageIO plugin to decode the JBIG2 encoded datastream.
Dec 06, 2013 5:15:42 PM org.apache.pdfbox.pdmodel.graphics.xobject.PDPixelMap getRGBImage
SEVERE: Something went wrong ... the pixelmap doesn't contain any data.
Dec 06, 2013 5:15:42 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke process
WARNING: getRGBImage returned NULL

I need to know how to resolve this issue? We have something like:

import org.apache.pdfbox.filter.JBIG2Filter;

which I don't know how to implement.

I am searching on that, but to no avail. Could anyone please suggest?

Ah, the exception gives a hint: "Can't find an ImageIO plugin to decode the JBIG2 encoded datastream"; PDFBox uses Java standard classes when rendering images, and they require external JBIG2 support. — mkl, Dec 06 '13 at 13:23
But then what's the use of import org.apache.pdfbox.filter.JBIG2Filter — Vaibhav Sawalkar, Dec 06 '13 at 13:34
*what's the use of `JBIG2Filter`* - As the Javadoc of that class says it is *Modeled on the JBIG2Decode filter.* According to the [specification ISO 32000-1](http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf) section 7.4.7 "JBIG2Decode Filter": *The JBIG2Decode filter (PDF 1.4) decodes monochrome (1 bit per pixel) image data that has been encoded using JBIG2 encoding.* — mkl, Dec 06 '13 at 19:29

adam · Accepted Answer · 2018-05-17T14:25:05.400

12

Take a look at this ticket in PDFBox https://issues.apache.org/jira/browse/PDFBOX-1067 . I think the answer to your question is:

to make sure that you have JAI and the JAI-ImageIO plugins installed for your version of Java: decent installation instructions are available here: http://docs.geoserver.org/latest/en/user/production/java.html
to use the JBIG2-imageio plugin, (newer versions are licensed under the Apache2 license) https://github.com/levigo/jbig2-imageio/

edited May 17 '18 at 14:25

answered Jan 10 '14 at 23:20

adam

1,067
11
24

1

I downloaded `jbig2-imageio-3.0.0.jar` from http://search.maven.org/remotecontent?filepath=org/apache/pdfbox/jbig2-imageio/3.0.0/jbig2-imageio-3.0.0.jar and this solved the problem (thank you!) and it has an Apache license which is more permissive than GPL3 ... – gordon613 May 17 '18 at 10:45

score 5 · Answer 2 · answered Jul 09 '19 at 18:45

5

I had the same problem and I fixed it by adding this dependency in my pom.xml :

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>jbig2-imageio</artifactId>
    <version>3.0.2</version>
</dependency>

Good luck.

answered Jul 09 '19 at 18:45

youssouf diarra

71
1
2

score 3 · Answer 3 · edited Feb 11 '17 at 11:14

3

I had the exact same problem. I downloaded the jar from jbig2-imageio and I just included it in my project's application libraries, and it worked right out of the box. As adam said, it uses GPL3.

edited Feb 11 '17 at 11:14

Tilman Hausherr

17,731
7
58
97

answered Feb 11 '15 at 16:21

lost in binary

544
1
4
11

score 1 · Answer 4 · edited Feb 11 '17 at 11:15

1

Installing the JAI seems not needed. I only needed to download the levigo-jbig2-imageio-1.6.5.jar, place it in the folder of my dependency-jars and in eclipse add it to the java build path libraries. https://github.com/levigo/jbig2-imageio/

edited Feb 11 '17 at 11:15

Tilman Hausherr

17,731
7
58
97

answered Aug 15 '15 at 13:35

Felix Mueller

362
3
8

score 1 · Answer 5 · answered Mar 20 '17 at 19:07

import java.awt.image.BufferedImage
import org.apache.pdfbox.cos.COSName

import org.apache.pdfbox.pdmodel.PDDocument
import org.apache.pdfbox.pdmodel.PDPage
import org.apache.pdfbox.pdmodel.PDPageTree
import org.apache.pdfbox.pdmodel.PDResources
import org.apache.pdfbox.pdmodel.graphics.PDXObject
import org.apache.pdfbox.rendering.ImageType
import org.apache.pdfbox.rendering.PDFRenderer
import org.apache.pdfbox.tools.imageio.ImageIOUtil


import javax.imageio.ImageIO
import javax.imageio.spi.IIORegistry
import javax.imageio.spi.ImageReaderSpi
import javax.swing.*
import javax.swing.filechooser.FileNameExtensionFilter

public class savePDFAsImage{

    String path = "c:/pdfImage/"

    //allow pdf file selection for extracting
    public static File selectPDF() {
        File file = null
        JFileChooser chooser = new JFileChooser()
        FileNameExtensionFilter filter = new FileNameExtensionFilter("PDF", "pdf")
        chooser.setFileFilter(filter)
        chooser.setMultiSelectionEnabled(false)
        int returnVal = chooser.showOpenDialog(null)
        if (returnVal == JFileChooser.APPROVE_OPTION) {
            file = chooser.getSelectedFile()
           println "Please wait..."
        }
        return file
    }

    public static void main(String[] args) {
        try {
 // help to view list of plugin registered. check by adding JBig2 plugin and JAI plugin
            ImageIO.scanForPlugins()
            IIORegistry reg = IIORegistry.getDefaultInstance()
            Iterator spIt = reg.getServiceProviders(ImageReaderSpi.class, false)
            spIt.each(){
                println it.getProperties()
            }
            testPDFBoxSaveAsImage()
            testPDFBoxExtractImagesX()
        } catch (Exception e) {
            e.printStackTrace()
        }
    }    

    public static void testPDFBoxExtractImagesX() throws Exception {
        PDDocument document = PDDocument.load(selectPDF())
        PDPageTree list = document.getPages()
        for (PDPage page : list) {
            PDResources pdResources = page.getResources()
            for (COSName c : pdResources.getXObjectNames()) {
                PDXObject o = pdResources.getXObject(c)
                if (o instanceof org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject) {
                    File file = new File( + System.nanoTime() + ".png")
                    ImageIO.write(((org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject) o).getImage(), "png", file)
                }
            }
        }
        document.close()
        println "Extraction complete"
    }
    public static void testPDFBoxSaveAsImage() throws Exception {
        PDDocument document = PDDocument.load(selectPDF().getBytes())
        PDFRenderer pdfRenderer = new PDFRenderer(document)
        for (int page = 0; page < document.getNumberOfPages(); ++page) {
            BufferedImage bim = pdfRenderer.renderImageWithDPI(page,300, ImageType.BINARY)
            // suffix in filename will be used as the file format
            OutputStream fileOutputStream = new FileOutputStream(+ System.nanoTime() + ".png")
            boolean b = ImageIOUtil.writeImage(bim, "png",fileOutputStream,300)
        }
        document.close()
        println "Extraction complete"
    }
}

It is always better when you provide some brief explanation about your code snippet. It helps others to understand your code better. — RITZ XAVI, Mar 20 '17 at 19:30
In the above code 1 . test for different plugin supported by JVM is covered by `ImageIO` block. 2 . testPDFBoxSaveAsImage() will save the pages as single image. 3.testPDFBoxExtractImagesX() will extract the images from PDF using the getxObject. — gbr, Mar 31 '17 at 15:01

PDF Box generating blank images due to JBIG2 Images in it

5 Answers5

Linked