2

I'm using iTextSharp, in a C# app that reads PDF files and breaks out the pages as separate PDF documents. It works well, except in the case of portfolios. Now I'm trying to figure out how to read a PDF portfolio (or Collection, as they seem to be called in iText) that contains two embedded PDF documents. I want to simply open the portfolio, enumerate the embedded files and then save them as separate, simple PDF files.

There's a good example of how to programmatically create a PDF portfolio, here: Kubrick Collection Example

But I haven't seen any examples that read portfolios. Any help would be much appreciated!

Randy Gamage
  • 1,801
  • 6
  • 22
  • 31

1 Answers1

3

The example you referenced adds the embedded files as document-level attachments. So you can extract the files like this:

PdfReader reader = new PdfReader(readerPath);
PdfDictionary root = reader.Catalog;
PdfDictionary documentnames = root.GetAsDict(PdfName.NAMES);
PdfDictionary embeddedfiles = 
    documentnames.GetAsDict(PdfName.EMBEDDEDFILES);
PdfArray filespecs = embeddedfiles.GetAsArray(PdfName.NAMES);
for (int i = 0; i < filespecs.Size; ) {
  filespecs.GetAsString(i++);
  PdfDictionary filespec = filespecs.GetAsDict(i++);
  PdfDictionary refs = filespec.GetAsDict(PdfName.EF);
  foreach (PdfName key in refs.Keys) {
    PRStream stream = (PRStream) PdfReader.GetPdfObject(
      refs.GetAsIndirectObject(key)
    );

    using (FileStream fs = new FileStream(
      filespec.GetAsString(key).ToString(), FileMode.OpenOrCreate
    )){
      byte[] attachment = PdfReader.GetStreamBytes(stream);
      fs.Write(attachment, 0, attachment.Length);
    }
  }
} 

Pass the output file from the Kubrick Collection Example you referenced to the PdfReader constructor (readerPath) if you want to test this.

Hopefully I'll have time to update the C# examples this month from version 5.2.0.0 (the iTextSharp version is about three weeks behind the Java version right now).

kuujinbo
  • 9,272
  • 3
  • 44
  • 57
  • You are the most awesomest! This works perfectly, thanks so much. I had a feeling it had to do with the dictionary under the Catalog, but there's no way I would have figured out all the details. That would be great to add this to the C# examples online. – Randy Gamage Aug 17 '12 at 23:59
  • The above code has a line "byte[] attachment = PdfReader.GetStreamBytes(stream);", which loads the attachment content to a byte array. If the attachment is a PDF file, I can open PdfReader for the attachment as "PdfReader reader = new PdfReader(attachment)". One problem with this, is that we loaded the whole attachment file into memory (the byte array). Is it possible to get to PdfReader of the attachment without loading the whole attachment file to memory? – Andrew Nov 05 '18 at 07:33