I am using iTextSharp, with C# in Visual Studio 2010 and I've recently encountered the following situation. I've received several ebooks split into numerous PDF files, these files contained galley marks in the borders and I removed them using the following code:
x = reader.GetPageSize(i).Width;
y = reader.GetPageSize(i).Height;
iTextSharp.text.Rectangle tRect =
new iTextSharp.text.Rectangle(x - 52, y - 52);
Document document = new Document(tRect);
PdfWriter writer = PdfWriter.GetInstance(document,
new FileStream(dest, FileMode.OpenOrCreate));
document.Open();
PdfContentByte content = writer.DirectContent;
PdfImportedPage page = writer.GetImportedPage(reader, i);
content.AddTemplate(page, -offset, -offset);
document.NewPage();
document.SetMargins(0, 0, 0, 0);
document.Close();
reader.Close();
Of course, this is enclosed in a For loop with i as the ordinal. After I've iterated through each of the pages in the portion I'm working on, I use the following code to merge them together:
private void mergePDF(string fName, string folderPath)
{
string[] files = Directory.GetFiles(folderPath);
iTextSharp.text.Document tDoc = new iTextSharp.text.Document();
iTextSharp.text.pdf.PdfCopy copy =
new iTextSharp.text.pdf.PdfCopy(tDoc,
new FileStream(fName, FileMode.Create));
tDoc.Open();
iTextSharp.text.pdf.PdfReader reader;
int n = 0;
for (int i = 0; i < files.Length; i++)
{
reader = new iTextSharp.text.pdf.PdfReader(files[i]);
n = reader.NumberOfPages;
for (int page = 0; page < n; )
{
copy.AddPage(copy.GetImportedPage(reader, ++page));
}
copy.FreeReader(reader);
reader.Close();
}
tDoc.Close();
}
After this is complete, I find that my file size is doubled (one file in particular weighed in at 20,180KB before processing and 41,322KB after processing)!
I did some digging and it seems that when splitting PDFs with iTextSharp the program embeds all of the fonts for the complete PDF in each PDF that is split off and apparently this can account for 50-80% of the file size.
That being said, does anyone know of a way to remove the embedded fonts from a PDF using iTextSharp. My plan is to include them only in the first PDF file and then when the PDF is recompliled there will only be one copy of the fonts in the document and my sizes will be more appropriate.
Also of note, this code is a close approximation of my actual code - the logic is identical but some variables have been added for size and flow considerations.