10

I'm attempting to split a PDF file page by page, and get each page file's byte array. However, I'm having trouble converting each page to byte array in iText version 7.0.4 for C#.

Methods referenced in other solutions rely on PdfWriter.GetInstance or PdfCopy, which seems to no longer exist in iText version 7.0.4.

I've gone through iText's sample codes and API documents, but I have not been able to extract any useful information out of them.

using (Stream stream = new MemoryStream(pdfBytes))
using (PdfReader reader = new PdfReader(stream))
using (PdfDocument pdfDocument = new PdfDocument(reader))
{
    PdfSplitter splitter = new PdfSplitter(pdfDocument);

    // My Attempt #1 - None of the document's functions seem to be of help.
    foreach (PdfDocument splitPage in splitter.SplitByPageCount(1))
    {
        // ??      
    }

    // My Attempt #2 - GetContentBytes != pdf file bytes.
    for (int i = 1; i <= pdfDocument.GetNumberOfPages(); i++)
    {
        PdfPage page = pdfDocument.GetPage(i);
        byte[] bytes = page.GetContentBytes();
    }
}

Any help would be much appreciated.

James.K
  • 125
  • 1
  • 1
  • 7
  • Are you dealing with a compressed pdf? Compressing an individual page will not result in the same bytes as when the whole file is compressed. So you should find a better way to define success than "returns the same bytes found in the file" – Ben Voigt Sep 25 '17 at 16:53
  • No, I'm dealing with uncompressed PDF files. All I needed was the ability to split PDF files and store the split pages for later use. Once they're split, I don't need to worry about putting the original document back together. – James.K Sep 26 '17 at 17:41

1 Answers1

6

Your approach of using PdfSplitter is one of the best ways to approach your task. Maybe not so much is available out of the box, but PdfSplitter is highly customizable and if you take a look at the implementation or simply the API, it becomes clear which are correct points for injecting your own customized behavior.

You should override GetNextPdfWriter to provide any output media you want the documents to be created at. You can also use IDocumentReadyListener to define the action that will be performed once another document is ready.

I am attaching one of the implementations that can achieve your goal:

class ByteArrayPdfSplitter : PdfSplitter {

    private MemoryStream currentOutputStream;

    public ByteArrayPdfSplitter(PdfDocument pdfDocument) : base(pdfDocument) {
    }

    protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange) {
        currentOutputStream = new MemoryStream();
        return new PdfWriter(currentOutputStream);
    }

    public MemoryStream CurrentMemoryStream {
        get { return currentOutputStream; }
    }

    public class DocumentReadyListender : IDocumentReadyListener {

        private ByteArrayPdfSplitter splitter;

        public DocumentReadyListender(ByteArrayPdfSplitter splitter) {
            this.splitter = splitter;
        }

        public void DocumentReady(PdfDocument pdfDocument, PageRange pageRange) {
            pdfDocument.Close();
            byte[] contents = splitter.CurrentMemoryStream.ToArray();
            String pageNumber = pageRange.ToString();
        }
    }
}

The calls would be basically as you did, but with custom document ready event:

PdfDocument docToSplit = new PdfDocument(new PdfReader(path));
ByteArrayPdfSplitter splitter = new ByteArrayPdfSplitter(docToSplit);
splitter.SplitByPageCount(1, new ByteArrayPdfSplitter.DocumentReadyListender(splitter));
Alexey Subach
  • 11,903
  • 7
  • 34
  • 60
  • Thank you very much! I should've read the docs more carefully... The only thing is that pageRange.ToString() returns a string of the object type, so I used GetQualifyingPageNums, along with slight modifications to your solution to get the correct page number for each page. I'm not modifying your solution because this is specific to my case. – James.K Sep 25 '17 at 17:30
  • @alexey-subach, any idea on how we can get byte array from SplitBySize method? Seems it does not have a DocumentReadyListener to notify when the split is finished. – Vinícius Fonseca Nov 15 '18 at 12:20
  • Great solution Alexey. May you please show how do you get each page bytes from contents variable back to the main thread? I can see how the variable is loaded with each page bytes but I don't reach to get it from the usage case you posted – Matias Masso Jun 29 '23 at 07:03