0

I'm trying to use codes below to merge the pdf files in a folder and output into a new file but apparently the generated file seems corrupted.

public Boolean MergeForm(String destinationFile, String sourceFolder)
    {
        try
        {
            using (MemoryStream stream = new MemoryStream())
            using (Document doc = new Document())
            using (PdfCopy pdf = new PdfCopy(doc, stream))
            {
                doc.Open();

                PdfReader reader = null;
                PdfImportedPage page = null;

                foreach (var file in Directory.GetFiles(sourceFolder))
                {
                    reader = new PdfReader(file);
                    for (int i = 0; i < reader.NumberOfPages; i++)
                    {
                        page = pdf.GetImportedPage(reader, i + 1);
                        pdf.AddPage(page);
                    }

                    pdf.FreeReader(reader);
                    reader.Close();
                }
                using (FileStream streamX = new FileStream(destinationFile, FileMode.Create))
                {
                    stream.WriteTo(streamX);
                }
            }
            return true;
        }
        catch (Exception)
        {
            return false;
        }
    }

Can anyone spot on where's the problem? Thank you.

Trowa
  • 355
  • 1
  • 6
  • 20
  • This looks very much like a duplicate of the recent question [using PdfCopy to merge pdf files](https://stackoverflow.com/questions/45951966/using-pdfcopy-to-merge-pdf-files). Why aren't you using the `AddDocument()` method instead of looping over the different pages, and adding only one page at a time? Are you using a recent version of iText? – Bruno Lowagie Sep 08 '17 at 08:32
  • The main problem however, is the moment you are writing the file. When you do `stream.WriteTo(streamX)`, the `Document` instance hasn't been closed yet. This means that the PDF that is written to `streamX` is incomplete. Plenty of information (such as the cross-reference table, fonts, the PDF trailer) is missing. That information is only added to `stream` when `docClose()` happens. In your case, this happens implicitly when one of those `}` brackets *after* `stream.WriteTo(streamX)` is reached. – Bruno Lowagie Sep 08 '17 at 08:34
  • @BrunoLowagie I'm using itextsharp 5.5.12.0. btw, how to modify the codes to use AddDocument? I will take a look in the another thread you sharing here, thank you. – Trowa Sep 08 '17 at 08:37
  • 1
    Did you read the code in the answer I refer to? You'll find something like this: `reader = New PdfReader(file); copy.AddDocument(reader); reader.Close();` The is *no need to loop over the pages*! – Bruno Lowagie Sep 08 '17 at 08:52
  • @BrunoLowagie Oops, I read your second comment only after writing the answer... – mkl Sep 08 '17 at 10:52
  • @mkl No problem. I upvoted your answer. – Bruno Lowagie Sep 08 '17 at 10:58

1 Answers1

3

Can anyone spot on where's the problem?

Your main problem is that you use the contents of the MemoryStream before the Document and PdfCopy have finished creating the PDF (during the Dispose at the end of the using block). Thus, you save an incomplete PDF file as a result.

Doing it like this instead should work:

    using (MemoryStream stream = new MemoryStream())
    {
        using (Document doc = new Document())
        {
            PdfCopy pdf = new PdfCopy(doc, stream);
            pdf.CloseStream = false;
            doc.Open();

            PdfReader reader = null;
            PdfImportedPage page = null;

            foreach (var file in Directory.GetFiles(sourceFolder))
            {
                reader = new PdfReader(file);
                for (int i = 0; i < reader.NumberOfPages; i++)
                {
                    page = pdf.GetImportedPage(reader, i + 1);
                    pdf.AddPage(page);
                }

                pdf.FreeReader(reader);
                reader.Close();
            }
        }
        using (FileStream streamX = new FileStream(destinationFile, FileMode.Create))
        {
            stream.WriteTo(streamX);
        }
    }

BTW, you also see here that I did not put PdfCopy into a using block. This is because the Document implicitly closes the PDFCopy when it is disposed. First disposing the PdfCopy and then the Document (which tries to close the PdfCopy again), therefore, is not necessary and can result in hiding exceptions thrown from within the block by other exceptions occurring in this closing circus.

Furthermore I needed to add the pdf.CloseStream = false, otherwise the memory stream would have been closed when the PdfCopy is closed.


That been said,

  1. Of course you should also use AddDocument instead of iterating over the document pages yourself as already explained by @Bruno.
  2. Your memory footprint would decrease if you immediately wrote to the file stream instead of the memory stream.
mkl
  • 90,588
  • 15
  • 125
  • 265