5

At my work sometimes I have to merge from few to few hundreds pdf files. All the time I've been using Writer and ImportedPages classes. But when I have merged all files into one, file size becomes enormous, sum of all merged files sizes, because fonts being attached to every page, and not reused (fonts are embedded to every page, not whole document).

Not very long time ago I found out about PdfSmartCopy class, which reuses embedded fonts and images. And here the problem kicks in. Very often, before merging files together, I have to add additional content to them (images, text). For this purpose I usually use PdfContentByte from Writer object.

Document doc = new Document();    
PdfWriter writer = PdfWriter.GetInstance(doc, new FileStream("C:\test.pdf", FileMode.Create));
PdfContentByte cb = writer.DirectContent;
cb.Rectangle(100, 100, 100, 100);
cb.SetColorStroke(BaseColor.RED);
cb.SetColorFill(BaseColor.RED);
cb.FillStroke();

When I do similar thing with PdfSmartCopy object, pages are merged, but no additional content being added. Full code of my test with PdfSmartCopy:

using (Document doc = new Document())
        {
            using (PdfSmartCopy copy = new PdfSmartCopy(doc, new FileStream(Path.GetDirectoryName(pdfPath[0]) + "\\testas.pdf", FileMode.Create)))
            {
                doc.Open();
                PdfContentByte cb = copy.DirectContent;
                for (int i = 0; i < pdfPath.Length; i++)
                {
                    PdfReader reader = new PdfReader(pdfPath[i]);
                    for (int ii = 0; ii < reader.NumberOfPages; ii++)
                    {
                        PdfImportedPage import = copy.GetImportedPage(reader, ii + 1);                            
                        copy.AddPage(import);
                        cb.Rectangle(100, 100, 100, 100);
                        cb.SetColorStroke(BaseColor.RED);
                        cb.SetColorFill(BaseColor.RED);
                        cb.FillStroke();
                        doc.NewPage();// net nesessary line
                        //ColumnText col = new ColumnText(cb);
                        //col.SetSimpleColumn(100,100,500,500);
                        //col.AddText(new Chunk("wdasdasd", PdfFontManager.GetFont(@"C:\Windows\Fonts\arial.ttf", 20)));
                        //col.Go();                            
                    }
                }
            }
        }
    }

Now I have few questions:

  1. Is it possible to edit PdfSmartCopy object's DirectContent?
  2. If not, is there another way to merge multiple pdf files into one not increasing its size dramatically and still being able to add additional content to pages while merging?
Masius
  • 326
  • 1
  • 6
  • 17

3 Answers3

11

First this: using PdfWriter/PdfImportedPage is not a good idea. You throw away all interactive features! Being the author of iText, it's very frustrating to so many people making the same mistake in spite of the fact that I wrote two books about this, and in spite of the fact that I convinced my publisher to offer one of the most important chapters for free: http://www.manning.com/lowagie2/samplechapter6.pdf

Is my writing really that bad? Or is there another reason why people keep on merging documents using PdfWriter/PdfImportedPage?

As for your specific questions, here are the answers:

  1. Yes. Download the sample chapter and search the PDF file for PageStamp.
  2. Only if you create the PDF in two passes. For instance: create the huge PDF first, then reduce the size by passing it through PdfCopy; or create the merged PDF first with PdfCopy, then add the extra content in a second pass using PdfStamper.
Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
  • 1
    Thank you for accepting the answer. I hope it helps people finding their way to the free chapter ;-) – Bruno Lowagie Oct 08 '12 at 06:20
  • To answer your rhetorical question: Because I never need interactive PDF features. I'm doing things like merging, bursting based on content, duplexing, and reorganizing PDF reports. None of the PDFs I'm manipulating with iTextSharp have so much as a clickable URL or bookmark navigation. The vast majority are PDF format v1.3. I'm just not losing anything, and if I were, I wouldn't need what I was losing. – Bacon Bits Nov 14 '18 at 15:47
7

Code after using Bruno Lowagie answer

for (int i = 0; i < pdfPath.Length; i++)
{
       PdfReader reader = new PdfReader(pdfPath[i]);
       PdfImportedPage page;
       PdfSmartCopy.PageStamp stamp;
       for (int ii = 0; ii < reader.NumberOfPages; ii++)
       {
            page = copy.GetImportedPage(reader, ii + 1);
            stamp = copy.CreatePageStamp(page);
            PdfContentByte cb = stamp.GetOverContent();
            cb.Rectangle(100, 100, 100, 100);
            cb.SetColorStroke(BaseColor.RED);
            cb.SetColorFill(BaseColor.RED);
            cb.FillStroke();
            stamp.AlterContents(); // don't forget to add this line
            copy.AddPage(page);                  
        }
}
Masius
  • 326
  • 1
  • 6
  • 17
0

2.Only if you create the PDF in two passes. For instance: create the huge PDF first, then reduce the size by passing it through PdfCopy; or create the merged PDF first with PdfCopy, then add the extra content in a second pass using PdfStamper.

It is much more difficult to use the PdfStamper with a second pass. When your working with lots of data it's far easier to create 1 pdf stamp then append.

PdfCopyFields had worked well for this. Now it doesn't work as of the 5.4.4.0 release which is why I'm here.