2

I have PDF files that have to be merge into one. This is not a issue if I have all the PDF files at one time. However the PDF files come in in stages and because of the workflow we need to merge them as they arrive.

So the workflow looks like this:

1: If no PDF file exist create one and merge the first set of pages into the new PDF file then close.

2: If a PDF already exist(target one) open it up and merge the then new pages into the target PDF files. then close

3: repeat.

Below is the code I have but it just overwrites the pages that were previously inserted, so my question is how do I insert PDF files into an already existing PDF file using PDFSmartCopy.

I need to use PDFSmartCopy because I need to optimized the fonts. I found this nice document on StackOverflow that shows how to how to append but they are using different techniques other than PDFSmartCopy.

Note: I'm not sure if I need to create a intermediate PDF file to hold the target PDF pages then delete the target PDF and then save the intermediate PDF as the target PDF. However, before I go and do that, I was wondering is there was a way to do it without the intermediate step.

using (FileStream stream = new FileStream(targetFile, FileMode.OpenOrCreate))
{
  Document pdfDoc = new Document(PageSize.LETTER);
  PdfSmartCopy pdf = new PdfSmartCopy(pdfDoc, stream);
  pdfDoc.Open();
  foreach (string file in files)
  {
    PdfReader reader = new PdfReader(file);
    pdf.AddDocument(reader);
    pdf.FreeReader(reader) ;
    reader.Close();
  }  

}
Community
  • 1
  • 1
Mike
  • 1,221
  • 3
  • 15
  • 34
  • I think that this is what you are looking for [Edit DirectContent of iTextSharp PdfSmartCopy class](http://stackoverflow.com/a/12741148/1851377). if not i think that this will get you in the direction that you need to go. i have not used `PdfSmartCopy` . hope it helps. – Eric Apr 29 '16 at 16:04
  • hi Eric, thank you but I will end up with the same problem. the file will get overwritten with the new data and old data will be loss. .. – Mike Apr 29 '16 at 16:39
  • when i merge pdfs together i create subfolders with the originals and the file that im merging everything to. I create a bread crumbs so that if there is something that goes wrong i still have everything from the start. – Eric Apr 29 '16 at 16:43
  • @Eric, no what I mean is the merge PDF file(the target) the pages get overwritten with the PDF file I'm trying to merge. So If my first merge was 10 pages that works and if the second merge is 2 pages I should have 12 pages but I have only two in the target PDF. I'm starting to lean towards trying to do this in memory however theses files are big 37,000 + pages. – Mike Apr 29 '16 at 16:51
  • ooooo .... so after looking at the source i looks like you can use the same code that you would for `PdfCopy` . I can Add the code below if you would like that i use for merging files together. you would just need to replace the `PdfCopy` with `PdfSmartCopy` – Eric Apr 29 '16 at 17:16
  • @eric sure pleas do. It might help – Mike Apr 29 '16 at 17:23
  • Your major issue is that you seem to think that the file you write the `PdfCopy` to will keep its original contents. It won't. It will simply be overwritten with the new contents. If you want your the former contents of the target file to be included, you have to add them to the `PdfCopy` explicitly, either via a `PdfReader` of that file you open beforehand or by targeting a different file, reading the original contents intermittently, and later renaming the temporary file. – mkl May 01 '16 at 02:56

1 Answers1

2

Here is the code that I use to merge Ppdf's together. Like I said above from what I see in the source code it looks like you can replace PdfCopy with PdfSmartCopy. Also based on the comments PdfSmartCopy with use more memory so that it can save a reference to the resources. It inherits from the PdfCopy class which leads me to think that just replacing PdfCopy with PdfSmartCopy should work.

I hope that this gets you in the right direction. Also here is a link to the source code it you want to task a look. https://github.com/itext/itextsharp

var document = new Document();
var writer = new PdfCopy(document, new FileStream(outFile, FileMode.Create));
document.Open();
    foreach (var fileName in groupfiles)
    {
        var reader = new PdfReader(Path.Combine(config.WorkingDirectory, fileName));
        for (var i = 1; i <= reader.NumberOfPages; i++)
        {
            var page = writer.GetImportedPage(reader, i);
            writer.AddPage(page);
        }
        reader.Close();
   }
writer.Close();
document.Close();
Eric
  • 122
  • 8
  • thank you but the above code that i use works to merge the PDF files if HAD all the PDF files to start with, but as my workflow states I could have 5 then 30 minutes later I get another say 10. because of timing issues we have to merge the PDF as they come in. and there lies the problem. when we merge the PDF files. it seem to me the PdfCopy etc is design to only work with new documents not existing document... that's the challenge but thank you very much for trying to help. – Mike Apr 29 '16 at 17:46
  • @Eric - were you able to find a solution to this problem? – STLDev Oct 05 '16 at 03:47
  • Thanks for this code sharing. @Eric, I have one more question, is it possible combine them even they do not become a pdf file? – Kun-Yao Wang May 18 '17 at 12:07
  • @Kun-yaoWang - im not entirely sure i understand what you are trying to do. if you are looking to combined the files im memory im sure you can do that but be careful at some point you may run out of memory. – Eric May 18 '17 at 22:36
  • that is a good point, I had finished this work by producing all PDF together and combine them together , thanks for reminding. @Eric – Kun-Yao Wang May 22 '17 at 07:25