2

I'm trying to convert pdf to image using java but when I convert it with pdf renderer, the text in image file that render and it's not english language can't readable as the image here ->

enter image description here

public static void main(String[] args) {

    File file = new File("path file");
    RandomAccessFile raf;
    try {
        raf = new RandomAccessFile(file, "r");

        FileChannel channel = raf.getChannel();
        ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
        PDFFile pdffile = new PDFFile(buf);
        // draw the first page to an image
        int num = pdffile.getNumPages();
        for (int i = 1; i <= num; i++) {
            PDFPage page = pdffile.getPage(i);

            // get the width and height for the doc at the default zoom
            int width = (int) page.getBBox().getWidth();
            int height = (int) page.getBBox().getHeight();

            Rectangle rect = new Rectangle(0, 0, width, height);
            int rotation = page.getRotation();
            Rectangle rect1 = rect;
            if (rotation == 90 || rotation == 270)
                rect1 = new Rectangle(0, 0, rect.height, rect.width);

            // generate the image
            BufferedImage img = (BufferedImage) page.getImage(rect.width, rect.height, // width
                                                                                        // &
                                                                                        // height
                    rect1, // clip rect
                    null, // null for the ImageObserver
                    true, // fill background with white
                    true // block until drawing is done
            );

            ImageIO.write(img, "png", new File("path file" + i + ".png"));
        }
    } catch (FileNotFoundException e1) {
        System.err.println(e1.getLocalizedMessage());
    } catch (IOException e) {
        System.err.println(e.getLocalizedMessage());
    }
}

Anyone know how to render pdf to image with language other than english ?

mkl
  • 90,588
  • 15
  • 125
  • 265
Mahalo Bankupu
  • 272
  • 3
  • 5
  • 19

1 Answers1

0

Those little boxes are the 'unknown character' character. It's possible your PDF didn't embed the requisite fonts and your system doesn't have them... though if they look fine in any ol PDF viewer (Reader can be too forgiving), that's probably not it.

Back in the day, I used a command line program called GhostScript. It's free/open source, so you can just downloaded it, read up on the command line, and let 'er rip. Heck, I used it years ago, so it may even have some slick UI by now.

Mark Storer
  • 15,672
  • 3
  • 42
  • 80