0

I am using iText to merge more than one PDF document in JAVA. I am basically using PdfCopy. Now following problems, I am facing out of this -

  1. Same fonts from different component files are getting repeated in the final PDF which is resulting into a PDF with some 100s of instances of the same font.

  2. Another biggest problem is - I am getting Arial MT font instead of Arial. Now what's the exact difference between the two? Is this iText issue or Acrobat issue? I have crosschecked - I do not have any ArialMT.ttf file in my Windows Fonts directory, there is only Arial.ttf.. Since this is making my product useless, how this can be resolved?

mkl
  • 90,588
  • 15
  • 125
  • 265
Kapil
  • 13
  • 7
  • **1**: Have you tried `PdfSmartCopy` instead of `PdfCopy` yet? It is optimized to re-use resources like fonts or images; on the downside, though, it requires more memory to execute the merge. **2** iText does not exchange fonts like that. Thus, there is some other issue at work underneath. Can you provide sample input and output PDFs and the pivotal source for analysis? – mkl Jun 03 '15 at 14:05
  • 1
    fun fact: PDF files don't contain fonts, they contain derivatives of fonts. Just because both files say they use font X does not mean they both use the same *derivative* of font X, and so unless PdfCopy is smart enough to unify different font subsets, the two font resources in the PDFs actually *are* different. – Mike 'Pomax' Kamermans Jun 04 '15 at 05:30
  • But Mike, do you have any idea on Arial MT font issue? – Kapil Jun 04 '15 at 10:20

1 Answers1

0

Question 1:

You claim that you merge different PDFs with identical fonts and that these fonts are repeated. Please note that the premise of your allegation is probably wrong.

Every separate PDF file may contain a subset of that font. Different files will require different font subsets, and neither PdfCopy (nor PdfSmartCopy for that matter) can merge font subsets. This could result in a bloated PDF file with way too many font subsets of the same font. (This paragraph was copy/pasted from How to parse multiple HTML files into a single PDF?)

How do you know if you are confronted with font subsets? That is answered here: What are the extra characters in the font name of my PDF?

If you look at the Fonts tab under Document Properties in Adobe Reader, you'll see something like "embedded subset".

Question 2:

If you look in your Windows font directory, you'll find a font file arial.ttf. That is the font file for Arial MT. The MT stands for the company that designed Arial. See Does one need to have a license for fonts if we are using ttf files in itext?

This is what I see when I look at the properties for arial.ttf on Windows:

arial.ttf Properties

Under company, you can read "The Monotype Corporation". MT is the abbreviation of Monotype.

However: all of this doesn't matter, as you are merging existing PDFs that contain existing, embedded fonts. In that case, iText doesn't care which fonts you have or have not available on Windows. It just takes the fonts as defined in the existing PDFs and if those fonts are named Arial MT, then that is the name iText is going to use.

Extra tip:

All the questions I refer to are bundled in a free ebook The Best iText Questions on StackOverflow. It is really worth downloading that book. I used this book to quickly find all the answers that were relevant to your question based on previous StackOverflow posts.

Community
  • 1
  • 1
Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165