C# iTextSharp Merge multiple pdf via byte array

Question

I am new to using iTextSharp and working with Pdf files in general, but I think I'm on the right track.

I iterate through a list of pdf files, convert them to bytes, and push all of the resulting bytes into a byte array. From there I pass the byte array to concatAndAddContent() to merge all of the pdf's into a single large pdf. Currently I'm just getting the last pdf in the list (they seem to be overwriting)

public static byte[] concatAndAddContent(List<byte[]> pdfByteContent)
    {
        byte[] allBytes;

        using (MemoryStream ms = new MemoryStream())
        {
            Document doc = new Document();
            PdfWriter writer = PdfWriter.GetInstance(doc, ms);

            doc.SetPageSize(PageSize.LETTER);
            doc.Open();
            PdfContentByte cb = writer.DirectContent;
            PdfImportedPage page;

            PdfReader reader;
            foreach (byte[] p in pdfByteContent)
            {
                reader = new PdfReader(p);
                int pages = reader.NumberOfPages;

                // loop over document pages
                for (int i = 1; i <= pages; i++)
                {
                    doc.SetPageSize(PageSize.LETTER);
                    doc.NewPage();
                    page = writer.GetImportedPage(reader, i);
                    cb.AddTemplate(page, 0, 0);

                }
            }

            doc.Close();
            allBytes = ms.GetBuffer();
            ms.Flush();
            ms.Dispose();
        }

        return allBytes;
    }

Above is the working code that results in a single pdf being created, and the rest of the files are being ignored. Any suggestions

Possible duplicate of [Merging multiple PDFs using iTextSharp in c#.net](http://stackoverflow.com/questions/6029142/merging-multiple-pdfs-using-itextsharp-in-c-net) — Thomas Weller, Jul 12 '16 at 21:30
Are you sure that the pdfByteContent list contains more than byte array? Can we see the code you use to call the function? — Kent Munthe Caspersen, Jul 12 '16 at 21:51
I don't know if it solves your problem, but it does not seem like you need to Dispose and Flush. Your "using" block will automatically dispose when exited. See http://stackoverflow.com/questions/21230314/c-sharp-flushing-streamwriter-and-a-memorystream. The Flush method is overridden to do nothing for the MemoryStream class (it is inherited from the Stream class, see https://msdn.microsoft.com/en-us/library/system.io.memorystream.flush(v=vs.110).aspx). — Kent Munthe Caspersen, Jul 12 '16 at 21:59
Instead of allBytes = ms.GetBuffer(); try using allBytes = ms.ToArray(); The former method returns all data in the buffer, which may include uninitialized bytes / garbage. See the remarks of https://msdn.microsoft.com/en-us/library/system.io.memorystream.getbuffer.aspx#remarksToggle — Kent Munthe Caspersen, Jul 12 '16 at 22:10
Without a doubt, as @KentMuntheCaspersen said, always use `ToArray()`, never use `GetBuffer()`. — Chris Haas, Jul 13 '16 at 00:00
See [this post](http://stackoverflow.com/a/23063576/231316) which shows you how to iterate over a bunch of `PdfReader` objects and use a simple `AddDocument()` method so that you don't need to iterate over pages. — Chris Haas, Jul 13 '16 at 00:02
@ThomasWeller Not a duplicate - that question does not take a byte array as an argument. — confusedandamused, Jul 13 '16 at 13:02
@ChrisHaas Are there any methods within PdfReader that take a Byte Array? — confusedandamused, Jul 13 '16 at 13:06
@KentMuntheCaspersen Changed the ms.GetBuffer() to my.ToArray() - same result. — confusedandamused, Jul 13 '16 at 13:10
@confusedandamused, yes, you can pass a byte array to the constructor. — Chris Haas, Jul 13 '16 at 13:38
@ChrisHaas Do you have any example code for this? I'm slightly confused by what you mean. — confusedandamused, Jul 13 '16 at 13:43

score 25 · Accepted Answer · edited May 23 '17 at 12:10

25

This is pretty much just a C# version of Bruno's code here.

This is pretty much the simplest, safest and recommended way to merge PDF files. The PdfSmartCopy object is able to detect redundancies in the multiple files which can reduce file size some times. One of the overloads on it accepts a full PdfReader object which can be instantiated however you want.

public static byte[] concatAndAddContent(List<byte[]> pdfByteContent) {

    using (var ms = new MemoryStream()) {
        using (var doc = new Document()) {
            using (var copy = new PdfSmartCopy(doc, ms)) {
                doc.Open();

                //Loop through each byte array
                foreach (var p in pdfByteContent) {

                    //Create a PdfReader bound to that byte array
                    using (var reader = new PdfReader(p)) {

                        //Add the entire document instead of page-by-page
                        copy.AddDocument(reader);
                    }
                }

                doc.Close();
            }
        }

        //Return just before disposing
        return ms.ToArray();
    }
}

edited May 23 '17 at 12:10

Community

1
1

answered Jul 13 '16 at 14:05

Chris Haas

53,986
12
141
274

1

Testing this right now - I've never tried passing types within a using statement, and currently receive an error "Type used in a using statement must me implicitly convertible to SystemIDisposible" Also where did you find documentation/use of copy.Adddocument() - it seems as though PdfSmartCopy doesn't contain a definition for it. – confusedandamused Jul 13 '16 at 14:27
Are you using iTextSharp 4.1.6? If so this code won't work with that. – Chris Haas Jul 13 '16 at 14:39
Ah I was on 4.x upgraded, and the function call worked. Currently it still seems to be overwriting each pdf I'm passing in via the bytes - could this be due to how I am passing in the byte array? I'm essentially taking each pdf, turning them into bytes, and placing them all into a List (which happens to be pdfByteContent) When calling File.WriteallBytes() each time I see the overwriting happening as well. – confusedandamused Jul 13 '16 at 14:54
@ChrisHaas, could you take a look at this question about itextSharp? https://stackoverflow.com/questions/45599357/tables-instead-of-columns-for-pdf-creation . Thank you so much in advance. I have a few questions that I would like clarified. – Euridice01 Aug 10 '17 at 13:48
I'm seeing this does merge the pdfs together, but I lose any form fields in the AcroFields property of the document. Trying to find a fix for that. I'm use 5.x right now. – Justin T. Watts Aug 31 '17 at 13:30

Nambirajan · Answer 2 · 2023-02-22T12:32:35.107

0

List<byte[]> finallist= new List<byte[]>();

finallist.Add(concatAndAddContent(bytes)); System.IO.File.WriteAllBytes("path",finallist);

edited Feb 22 '23 at 12:32

answered Dec 31 '21 at 09:03

Nambirajan

1
1

2

add information about your answer. code only answers will not be very clear – Venkataraman R Dec 31 '21 at 09:03
Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 31 '21 at 09:04

C# iTextSharp Merge multiple pdf via byte array

2 Answers2

Linked