18

I have several datasheets for products. Each is a separate file. What I want to do is to use iText to generate a summary / recommended set of actions, based on answers to a webform, and then append to that all the relevant datasheets. This way, I only need to open one new tab in the browser to print all information, rather than opening one for the summary, and one for each datasheet that is needed.

So, is it possible to do this using iText?

marcolopes
  • 9,232
  • 14
  • 54
  • 65
Matt
  • 1,596
  • 2
  • 18
  • 32

7 Answers7

35

Yes, you can merge PDFs using iText 7. E.g. look at the iText 7 Jump-Start tutorial sample C06E04_88th_Oscar_Combine, the pivotal code is:

PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
PdfMerger merger = new PdfMerger(pdf);

//Add pages from the first document
PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(SRC1));
merger.merge(firstSourcePdf, 1, firstSourcePdf.getNumberOfPages());

//Add pages from the second pdf document
PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(SRC2));
merger.merge(secondSourcePdf, 1, secondSourcePdf.getNumberOfPages());

firstSourcePdf.close();
secondSourcePdf.close();
pdf.close();

(C06E04_88th_Oscar_Combine method createPdf)


Depending on your use case, you might want to use the PdfDenseMerger with its helper class PageVerticalAnalyzer instead of the PdfMerger here. It attempts to put content from multiple source pages onto a single target page and corresponds to the iText 5 PdfVeryDenseMergeTool from this answer. Due to the nature of PDF files this only works for PDFs without headers, footers, and similar artifacts.

mkl
  • 90,588
  • 15
  • 125
  • 265
  • Do the Documents need to be closed at the end, too? If so, which should be closed first - the Documents or the PdfDocuments? – B. Clay Shannon-B. Crow Raven Jun 26 '20 at 14:28
  • 1
    The code in the answer only contains `PdfDocument` instances, no `Document` instances. Which *Documents* do you mean? – mkl Jun 26 '20 at 14:49
  • In my code, I use the Document object to add paragraphs to. If you don't use it, then I guess I'll just experiment with it - first closing one set of objects, then the other, and see what happens/doesn't happen. – B. Clay Shannon-B. Crow Raven Jun 27 '20 at 01:30
  • 1
    If I remember correctly, closing a `Document` automatically closes the underlying `PdfDocument`. Nonetheless, I'd usually close both, usually because I use them in try-with-resources structures anyway. – mkl Jun 27 '20 at 05:59
  • how to return `byte[]` , if we're using `PdfMerger`. Because i am getting hard time with this https://stackoverflow.com/questions/63010524/pdfdocument-to-byte-using-pdfmerger-itext7 – Amol Bais Jul 21 '20 at 09:05
5

I found a solution that works quite well.

public byte[] Combine(IEnumerable<byte[]> pdfs)
{
    using (var writerMemoryStream = new MemoryStream())
    {
        using (var writer = new PdfWriter(writerMemoryStream))
        {
            using (var mergedDocument = new PdfDocument(writer))
            {
                var merger = new PdfMerger(mergedDocument);

                foreach (var pdfBytes in pdfs)
                {
                    using (var copyFromMemoryStream = new MemoryStream(pdfBytes))
                    {
                        using (var reader = new PdfReader(copyFromMemoryStream))
                        {
                            using (var copyFromDocument = new PdfDocument(reader))
                            {
                                merger.Merge(copyFromDocument, 1, copyFromDocument.GetNumberOfPages());
                            }
                        }
                    }
                }
            }
        }

        return writerMemoryStream.ToArray();
    }
}

Use

DirectoryInfo d = new DirectoryInfo(INPUT_FOLDER);
            
var pdfList = new List<byte[]> { };

foreach (var file in d.GetFiles("*.pdf"))
{
    pdfList.Add(File.ReadAllBytes(file.FullName));
}


File.WriteAllBytes(OUTPUT_FOLDER + "\\merged.pdf", Combine(pdfList));

Autor: https://www.nikouusitalo.com/blog/combining-pdf-documents-using-itext7-and-c/

Jean Paul Beard
  • 337
  • 3
  • 3
3

If you want to add two array of bytes and return one array of bytes as PDF/A

public static byte[] mergePDF(byte [] first, byte [] second) throws IOException {
    // Initialize PDF writer
    ByteArrayOutputStream arrayOutputStream = new ByteArrayOutputStream();
    PdfWriter writer = new PdfWriter(arrayOutputStream);
    

    // Initialize PDF document
    PdfADocument pdf = new PdfADocument(writer, PdfAConformanceLevel.PDF_A_1B, new PdfOutputIntent("Custom", "",
            "https://www.color.org", "sRGB IEC61966-2.1", new FileInputStream("sRGB_CS_profile.icm")));


    PdfMerger merger = new PdfMerger(pdf);

    //Add pages from the first document
    PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(new ByteArrayInputStream(first)));
    merger.merge(firstSourcePdf, 1, firstSourcePdf.getNumberOfPages());

    //Add pages from the second pdf document
    PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(new ByteArrayInputStream(second)));
    merger.merge(secondSourcePdf, 1, secondSourcePdf.getNumberOfPages());
    
    firstSourcePdf.close();
    secondSourcePdf.close();
    writer.close();
    pdf.close();


    return arrayOutputStream.toByteArray();
}
mesompi
  • 659
  • 6
  • 12
2

The question doesn't specify the language, so I'm adding an answer using C#; this works for me. I'm creating three separate but related PDFs then combining them into one.

After creating the three separate PDF docs and adding data to them, I combine them this way:

PdfDocument pdfCombined = new PdfDocument(new PdfWriter(destCombined));
PdfMerger merger = new PdfMerger(pdfCombined);

PdfDocument pdfReaderExecSumm = new PdfDocument(new PdfReader(destExecSumm));
merger.Merge(pdfReaderExecSumm, 1, pdfReaderExecSumm.GetNumberOfPages());

PdfDocument pdfReaderPhrases = new PdfDocument(new PdfReader(destPhrases));
merger.Merge(pdfReaderPhrases, 1, pdfReaderPhrases.GetNumberOfPages());

PdfDocument pdfReaderUncommonWords = new PdfDocument(new PdfReader(destUncommonWords));
merger.Merge(pdfReaderUncommonWords, 1, pdfReaderUncommonWords.GetNumberOfPages());

pdfCombined.Close();

So the combined PDF is a PDFWriter type of PdfDocument, and the merged pieces parts are PdfReader types of PdfDocuments, and the PdfMerger is the glue that binds it all together.

B. Clay Shannon-B. Crow Raven
  • 8,547
  • 144
  • 472
  • 862
0

Here is the minimum C# code needed to merge file1.pdf into file2.pdf creating new merged.pdf:

var path = @"C:\Temp\";

var src0 = System.IO.Path.Combine(path, "merged.pdf");
var wtr0 = new PdfWriter(src0);
var pdf0 = new PdfDocument(wtr0);

var src1 = System.IO.Path.Combine(path,  "file1.pdf");
var fi1 = new FileInfo(src1);
var rdr1= new PdfReader(fi1);
var pdf1 = new PdfDocument(rdr1);

var src2 = System.IO.Path.Combine(path,  "file2.pdf");
var fi2 = new FileInfo(src2);
var rdr2 = new PdfReader(fi2);
var pdf2 = new PdfDocument(rdr2);

var merger = new PdfMerger(pdf0);

merger.Merge(pdf1, 1, pdf1.GetNumberOfPages());
merger.Merge(pdf2, 1, pdf2.GetNumberOfPages());

merger.Close();
pdf0.Close();
Daniel Williams
  • 8,912
  • 15
  • 68
  • 107
0

Here is a VB.NET solution using open source iText7 that can merge multiple PDF files to an output file.

Imports iText.Kernel.Pdf
Imports iText.Kernel.Utils

    Public Function Merge_PDF_Files(ByVal input_files As List(Of String), ByVal output_file As String) As Boolean
        Dim Input_Document As PdfDocument = Nothing
        Dim Output_Document As PdfDocument = Nothing
        Dim Merger As PdfMerger

        Try
            Output_Document = New iText.Kernel.Pdf.PdfDocument(New iText.Kernel.Pdf.PdfWriter(output_file))
            'Create the output file (Document) from a Merger stream'
            Merger = New PdfMerger(Output_Document)

            'Merge each input PDF file to the output document'
            For Each file As String In input_files
                Input_Document = New PdfDocument(New PdfReader(file))
                Merger.Merge(Input_Document, 1, Input_Document.GetNumberOfPages())
                Input_Document.Close()
            Next

            Output_Document.Close()
            Return True
        Catch ex As Exception
            'catch Exception if needed'
            If Input_Document IsNot Nothing Then Input_Document.Close()
            If Output_Document IsNot Nothing Then Output_Document.Close()
            File.Delete(output_file)

            Return False
        End Try
    End Function

USAGE EXAMPLE:

Dim success as boolean = false
Dim input_files_list As New List(Of String)

input_files_list.Add("c:\input_PDF1.pdf")
input_files_list.Add("c:\input_PDF2.pdf")
input_files_list.Add("c:\input_PDF3.pdf")

success = Merge_PDF_Files(input_files_list, "c:\output_PDF.pdf")
'Optional: handling errors'   
if success then 
    'Files merged'
else
    'Error merging files'
end if
cyberponk
  • 1,585
  • 18
  • 19
0

I use this code, and It works...

using System;
using System.IO;
using iText.Kernel.Pdf;
using iText.Kernel.Utils;

namespace iText7_Merge
{
    internal class MergedPdf
    {
        public const String SRC1 = @"C:\tmp\trash\Pdf_1.pdf";
        public const String SRC2 = @"C:\tmp\trash\Pdf_2.pdf";
        public const String DEST = @"C:\tmp\trash\Pdf_merged.pdf";


        static void Main(string[] args)
        {
            FileInfo file = new FileInfo(DEST);
            file.Directory.Create();
            new MergedPdf().createPdf(DEST);
        }

        public void createPdf(String dest)
        {
            //Initialize PDF document with output intent
            PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
            PdfMerger merger = new PdfMerger(pdf);

            //Add pages from the first document
            PdfDocument firstSourcePdf = new PdfDocument(new PdfReader(SRC1));
            merger.Merge(firstSourcePdf, 1, firstSourcePdf.GetNumberOfPages());

            //Add pages from the second pdf document
            PdfDocument secondSourcePdf = new PdfDocument(new PdfReader(SRC2));
            merger.Merge(secondSourcePdf, 1, secondSourcePdf.GetNumberOfPages());

            firstSourcePdf.Close();
            secondSourcePdf.Close();
            pdf.Close();
        }
    }
}