0

I'm trying to extract an image from PDF with customized ILocationExtractionStrategy to process texts, shapes and images along with PdfCanvasProcessor. After the source file is closed, I need to reuse this image (Copy to another document or save as PNG file).

I am using the bytes from ImageRenderInfo#getImage()#getImageBytes() to get byte[].

To copy to another file:

imageData = ImageDataFactory.create(bytes);
image = new Image(imageData);
pdfCanvas.add(image, rectangle, false);

And to save as PNG:

// Internal function to write bytes to file
FileUtility.writeBytesToFile(path, fileName, bytes);

When I add the created image to PdfCanvas or save it as PNG the background becomes black.

EDIT:
I managed to extract both image and transparency image from the document. Yet, couldn't merge them to one image.

My attempts lead me to use ImageData from both images:

ImageData image = ImageDataFactory.create(imageBytes);
ImageData transparency = ImageDataFactory.create(transparencyBytes);
transparency.makeMask();
image.setImageMask(transparency);

and when I save the image as PDF it's as expected but when I save it as PNG there is still the black background.
I saved the transparency image as PNG and the black background is there as well.

EDIT: I successfully solved my problem. This is the full code from my extractor:

private void readImage(ImageRenderInfo data) {
        try {

            BufferedImage inputImage = data.getImage().getBufferedImage();
            BufferedImage dest = new BufferedImage(inputImage.getWidth(), inputImage.getHeight(),
                    BufferedImage.TYPE_INT_ARGB);
            Graphics2D graphics = dest.createGraphics();
            graphics.drawImage(inputImage, 0, 0, null);

            // Transparency comes as another image in PDF format.
            if (data.getImage().getPdfObject().containsKey(PdfName.SMask)) {
// The getRefersTo() initiate the mask image.
                PdfObject refersTo = data.getImage().getPdfObject().get(PdfName.SMask).getIndirectReference()
                        .getRefersTo();
                if (refersTo != null && refersTo.isStream()) {

                    BufferedImage maskImage = new PdfImageXObject((PdfStream) refersTo).getBufferedImage();

                    java.awt.Image transparency = transformTransparency(maskImage);
                    AlphaComposite ac = AlphaComposite.getInstance(AlphaComposite.DST_IN, 1.0F);
                    graphics.setComposite(ac);
                    graphics.drawImage(transparency, 0, 0, null);
                }
            }
            graphics.dispose();

            Image image = new Image(ImageDataFactory.create(dest, null));
            
// dest and image are now fully transparent and ready for save.
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * Create transparent background.
     * @param image
     * @return
     */
    private java.awt.Image transformTransparency(BufferedImage image) {
        ImageFilter filter = new RGBImageFilter() {

            @Override
            public final int filterRGB(int x, int y, int rgb) {
                return (rgb << 8) & 0xFF000000;
            }
        };

        ImageProducer ip = new FilteredImageSource(image.getSource(), filter);
        return Toolkit.getDefaultToolkit().createImage(ip);
    }
  • I think it's because PNG does not exist in the PDF format. PNG has color channels, like BMP, and an Alpha channel, for transparency. When you add a PNG image to a PDF, you actually add 2 images: one with the color information, and another with the alpha (transparancy/background) information. I will link to a related question (and answer). – Amedee Van Gasse Oct 21 '20 at 10:40
  • Bruno's answer there is a bit long-winding, so to summarize: you need to extract not one but two images (color+transparency). – Amedee Van Gasse Oct 21 '20 at 10:43
  • Another solution, which may work depending on your type of documents: from the very start of your document creation, don't use PNG at all. Use BMP instead. BMP does not have transparency, so you avoid the problem of having two images layered on top of each other. This may or may not be useful for you, it really depends on your type of documents. – Amedee Van Gasse Oct 21 '20 at 10:46
  • Thank you for your answers. I understand what Bruno meant but all I see is 1 image in the processor event. Is there another way to extract both images? – Michael Azarzar Oct 21 '20 at 11:45
  • I hope that someone else can answer that. :) – Amedee Van Gasse Oct 21 '20 at 13:39

0 Answers0