2

I want to create an image from first page of PDF . I am using PDFBox . After researching in web , I have found the following snippet of code :

public class ExtractImages
 {
    public static void main(String[] args)
    {
        ExtractImages obj = new ExtractImages();
            try 
            {
                obj.read_pdf();
            }

            catch (IOException ex)
            {
                System.out.println("" + ex);
            }

    }

    void read_pdf() throws IOException 
    {
            PDDocument document = null; 
            try 
            {
                document = PDDocument.load("H:\\ct1_answer.pdf");
            }
            catch (IOException ex)
            {
                System.out.println("" + ex);
            }

            List<PDPage>pages =  document.getDocumentCatalog().getAllPages();
            Iterator iter =  pages.iterator(); 

            int i =1;
            String name = null;

            while (iter.hasNext()) 
            {
                PDPage page = (PDPage) iter.next();
                PDResources resources = page.getResources();
                Map pageImages = resources.getImages();
                if (pageImages != null) 
                { 
                    Iterator imageIter = pageImages.keySet().iterator();
                    while (imageIter.hasNext()) {
                        String key = (String) imageIter.next();
                        PDXObjectImage image = (PDXObjectImage) pageImages.get(key);
                        image.write2file("H:\\image" + i);
                        i ++;
                    }
                }
            }

        }

 } 

In the above code there is no error . But the output of this code is nothing . I have expected that the above code will produce a series of image which will be saved in H drive . But there is no image in that code produced from this code . Why ?

Christophe Roussy
  • 16,299
  • 4
  • 85
  • 85
osimer pothe
  • 2,827
  • 14
  • 54
  • 92
  • What an fantasy observation you have been made ! I have understood every line of this code . It seems that the code should work to accomplish my goal . But the fact is that it doesn't work according to my expectation . – osimer pothe Feb 14 '13 at 06:34
  • 1
    Are you trying to extract images embedded in your PDF page and write them to disk? Because *that* is what this code does. – Brian Roach Feb 14 '13 at 06:37
  • I want to output first page of PDF as an image . – osimer pothe Feb 14 '13 at 06:39
  • Possible duplicate of: http://stackoverflow.com/questions/4523688/pdfbox-problem-with-converting-pdf-page-into-image – Christophe Roussy Feb 14 '13 at 12:49
  • The link given by you is not a duplication of my question . Look that in the code of our link there are a BufferedImage class which is not supported by android . @Christophe Roussy – osimer pothe Feb 14 '13 at 19:17

2 Answers2

7

Without trying to be rude, here is what the code you posted does inside its main working loop:

PDPage page = (PDPage) iter.next();
PDResources resources = page.getResources();
Map pageImages = resources.getImages();

It's getting each page from the PDF file, getting the resources from the page, and extracting the embedded images. It then writes those to disk.

If you are to be a competent software developer you need to be able to research and read documentation. With Java, that means Javadocs. Googling PDPage (or explicitly going to the apache site) turns up the Javadoc for PDPage.

On that page you find two versions of the method convertToImage() for converting the PDPage to an image. Problem solved.

Except ...

Unfortunately, they return a java.awt.image.BufferedImage which based on other questions you have asked is a problem because it is not supported on the Android platform which is what you're working on.

In short, you can't use Apache's PDFBox on Android to do what you're trying to do.

Searching on StackOverflow you find this same question posed several times in different forms, which will lead you to this: https://stackoverflow.com/questions/4665957/pdf-parsing-library-for-android/4766335#4766335 with the following answer that would be of interest to you: https://stackoverflow.com/a/4779852/302916

Unfortunately even the one that the aforementioned answer says will work ... is not very user friendly; there's no "How to" or docs that I can find. It's also labeled as "alpha". This is probably not something for the feint hearted as it's going to require reading and understanding their code to even start using it.

Community
  • 1
  • 1
Brian Roach
  • 76,169
  • 12
  • 136
  • 161
  • But this trend to create picture from PDF is a very common task . This task is done in Aldiko which is a pdf reading library . I have to do it at any cost . – osimer pothe Feb 15 '13 at 19:03
  • `ImageIO` and `BufferedImage` are all you need to write an image to disk. But you're saying these Java libs are not supported on Android? O.o – Don Cheadle Feb 13 '15 at 15:34
1

I copied your above code and added following libs to my buildpath in eclipse. It is working.

Apache PDFBox 1.7.1 libs

Commons Logging 1.1.1 libs

Wladimir Palant
  • 56,865
  • 12
  • 98
  • 126
GltknBtn
  • 512
  • 6
  • 13
  • I want to create an image from first page of PDF . But the above code snippet extracts images from pdf ? What can I do to achieve my goal ? – osimer pothe Feb 14 '13 at 18:41
  • You can only get the first page instead of the following lines: Listpages = document.getDocumentCatalog().getAllPages(); hopes it answers your question. Regards. – GltknBtn Feb 19 '13 at 08:28