9

I am trying to retrieve a File or InputStream instance from PDDocument without saving a PDDocument to the file system.

 PDDocument doc= new PDDocument(); 
 ...     
 doc.save("D:\\document.pdf"); 
 File f= new File("D:\\document.pdf"); 

Is there any method in PDFBox which returns File or InputStream from an existing PDDocument?

Franz Kafka
  • 10,623
  • 20
  • 93
  • 149
Milos Gavrilov
  • 446
  • 1
  • 5
  • 15

4 Answers4

10

I solved it:

PDDocument doc=new PDDocument();        
PDStream ps=new PDStream(doc);
InputStream is=ps.createInputStream();
Franz Kafka
  • 10,623
  • 20
  • 93
  • 149
Milos Gavrilov
  • 446
  • 1
  • 5
  • 15
  • 3
    This "solution" does not make any sense. I doubt any of the up-voters has actually tested it, nor do they seem to know PDFBox. E.g. the JavaDocs of the `public PDStream(PDDocument document)` constructor describe the `document` parameter as *The document that the stream will be part of.* So `ps=new PDStream(doc)` merely creates a new (empty) PDF stream *inside* the document, *not* a stream *containing* the document. – mkl Apr 09 '18 at 09:05
  • Maybe it helps: https://stackoverflow.com/questions/11593116/using-pdfbox-how-do-i-retrieve-contents-of-pddocument-as-a-byte-array/17656461#17656461 – mkczyk Feb 10 '20 at 11:27
5

I solve it in this way ( It's creating a file but in temporary-file directory ):

final PDDocument document = new PDDocument();
final File file = File.createTempFile(filename, ".pdf");
document.save(file);

and if you need

document.close();
1

What if you first create the outputstream

PDDocument doc= new PDDocument(); 
File f= new File("D:\\document.pdf");
FileOutputStream fOut = new FileOutputStream(f);  
doc.save(fOut); 

Take a look at this http://pdfbox.apache.org/apidocs/org/apache/pdfbox/pdmodel/PDDocument.html#save(java.io.OutputStream)

fGo
  • 1,146
  • 5
  • 11
  • The problem is that I don't want to create a file on my file system, I want to put this pdf file directly in Alfresco repository...do you have any idea how can I do this? – Milos Gavrilov Jun 03 '13 at 10:16
  • @MilosGavrilov and what protocol does alfresco support to do this transfer? Because as soon as you get an outputstream to write to you are good to go – fGo Jun 03 '13 at 10:35
  • 1
    this does not answer the question as it gets the stream from a file on the computer. – william.eyidi Jul 07 '15 at 15:05
0

I am trying to retrieve a File or InputStream instance from PDDocument without saving a PDDocument to the file system.

[...]

Is there any method in PDFBox which returns File or InputStream from an existing PDDocument?

Obviously PDFBox cannot return a meaningful File object without saving a PDDocument to the file system.

It does not offer a method providing an InputStream directly either but it is easy to write code around it that does. e.g.:

InputStream docInputStream = null;

try (   ByteArrayOutputStream baos = new ByteArrayOutputStream();
        PDDocument doc = new PDDocument()   )
{
    [...]
    doc.save(baos);
    docInputStream = new ByteArrayInputStream(baos.toByteArray());
}
Community
  • 1
  • 1
mkl
  • 90,588
  • 15
  • 125
  • 265