How to remove duplicate fonts in a PDF file using iText 7

Question

I'm using iText 7 library (v7.0.5.0). I create a PDF file (A-1B Conformance) from a RadDiagram (Telerik library) in .Net.

When the PDF file is generated, in its properties (Acrobat Reader > File > Properties > Fonts), there are a lot of brought fonts by the file, but only 4 master fonts, with variants (Arial, Segoe, Tahoma, TimesNewRoman).

I can see that there are a lot of duplicate fonts with the same name.

If I save the file from Acrobat Reader to "reduced PDF file", all of the duplicate fonts are purged, to keep only 1 font for each font names.

I search a solution to programatically remove these duplicate fonts because these duplicate fonts significantly increase the PDF file size. With the Acrobat Reader compression, the file size decrease from 2,2 Mo to 906 Ko (without quality loss).

You can find here an example of my PDF file.

This file has :

8 ArialMT
3 SegoeUI

This is an example, but sometimes, my files are very big, and for example, the compression decrease the size from 16 Mo to 1 Mo because there is a lot of duplicate fonts.

[EDIT] About my use case :

From RadDiagram Telerik objects, I export them into a PDF file like an image. This PDF file (with only 1 page), is serialized to Bytes() and saved into database. At a specific step, all of serialized PDF are concatened into a global PDF file. Clearly, the problem is when I save each PDF file, because at each creation, I call this code :

_pdfFont = PdfFontFactory.CreateFont(FONT_PATH_ARIAL, PdfEncodings.IDENTITY_H, True)

Declarations :

Private Const FONT_PATH_ARIAL As String = "c:\windows\fonts\Arial.ttf"  
Private _pdfFont As PdfFont

The _pdfFont object is called at each SetFont() method.

But the creation step is important because when I close de Document object, this one needs to know the font, which iwas created only for it.

A the end, this the same font (they have the same name), but each font is not merged, and so they are added, in the global PDF file.

[End Edit]

Thanks a lot.

In one of the comments to the question you refer to, a pdf file with embedded fonts was requested, but the question was abandoned by the OP. Don't be like that person. Put your pdf file online so others can try to reproduce your issue. — Amedee Van Gasse, Aug 03 '18 at 11:00
Indeed, we need to see the file. Maybe you don't see the fonts as being used, but maybe there are still some references to those fonts from the page resources. As long as there are references to an object, the object is not *unused*. iText doesn't go as far as to remove unused resources from page dictionaries (yet). — Bruno Lowagie, Aug 03 '18 at 11:12
You can find [here](https://onedrive.live.com/?authkey=%21ABlQn0gAKHhEl0k&cid=A4605E6B84968379&id=A4605E6B84968379%211453&parId=A4605E6B84968379%21143&o=OneUp) my PDF file example. I found during my researches that there are some duplicate fonts, and the Acrobat _reduced size PDF File_ option delete these duplicate fonts, and after, each font is only in 1 copy. — Obiwan Kenobi, Aug 03 '18 at 13:06
So I think that the real problem is probably "How to remove duplicate fonts to keep only one font by font type", no ? May be I must to update my post, no ? — Obiwan Kenobi, Aug 03 '18 at 14:45
Yes please, you should update the post and include the link to the PDF there as well — Alexey Subach, Aug 04 '18 at 05:55
I edited the subject. Otherwise, I try to remove duplicate fonts using `PDFCopy` / `PDFSmartCopy` / `PDFStamper` but duplicate fonts are never deleted. Otherwise, I found the following solution, to potentially get all fonts (and maybe delete them) : `PdfDictionary formDictionary = pdfDocument.GetCatalog().GetPdfObject().GetAsDictionary(PdfName.All); PdfDictionary resources = formDictionary.GetAsDictionary(PdfName.DR); PdfDictionary fonts = resources.GetAsDictionary(PdfName.Font);` But `formDictionary` is always `Null`. — Obiwan Kenobi, Aug 06 '18 at 08:58
Another way ca be to use the same `PdfFont` object into many `PdfDocument` objects because, at the end, I merge some `PdfDocument` objects (with 1 page), into only 1 PDF Document. The problem with this solution is that a `PdfFont` object must be used in only 1 `PdfDocument`. Do you know what I mean ? — Obiwan Kenobi, Aug 06 '18 at 09:59
@BrunoLowagie I see into [this subject](https://stackoverflow.com/questions/44652992/itext7-generate-pdf-with-exception-pdf-indirect-object-belongs-to-other-pdf-doc) that you've been created an internal ticket about the same `PdfFont` object used for many `PdfDocument` objects. Does it "bug" solved ? — Obiwan Kenobi, Aug 06 '18 at 10:23
The problem described in that ticket was solved. That doesn't mean you can just use any object to a document. The document needs to be aware of the object. — Bruno Lowagie, Aug 06 '18 at 11:47
@BrunoLowagie Thank for the feedback. You say that now it's possible to use a same `PdfFont` object in many `PdfDocument` objects ? — Obiwan Kenobi, Aug 06 '18 at 12:09
Are you sure you're not confusing the `PdfFont` class (suited for a specific PDF document) with the `FontProgram` class (suited to create a `PdfFont` for different documents)? It doesn't make sense to use a `PdfFont` for different documents, because different documents require different font subsets. — Bruno Lowagie, Aug 06 '18 at 14:08
@BrunoLowagie I updated my post to add my _UseCase_. You probably understand why I want to use the same font for differents documents. — Obiwan Kenobi, Aug 06 '18 at 14:47
Your design is flawed in the sense that you are creating different PDF files that each have a different font subset. When merging these different PDF files, the different subsets can't be merged (this is currently not supported by iText). To fix your design, you should find a way to create the complete PDF in one go. I don't understand your explanation about storing the content as an image. Images usually consist of pixels and no fonts are involved. On the other hand, I think you don't fully understand my explanation about font subsets. — Bruno Lowagie, Aug 06 '18 at 14:58
I'm going to stop commenting, because you are crossing my personal border between advice that can be given for free and consultancy that can only be provided for a fee. Please respect that personal border, and address other people if you want free advice. — Bruno Lowagie, Aug 06 '18 at 14:59
@BrunoLowagie Sadly, there are some business constraints... I understand your explanation about subsets, but thought that there is a possibility to remove duplicate fonts by their name for example, I don't know... I will to check if there is another way to generate my PDF file in only one time. Sadly, nobody knows this library better than you... Thanks for your answers. — Obiwan Kenobi, Aug 06 '18 at 15:23
I'm afraid I have to agree with @Bruno - merging different subsets of the same font is quite some work, one not only has to work on the fonts but also on all the usages of it. And there are quite some boundary conditions to consider. Thus, I'd think this is (far) beyond the scope of a stack overflow question. — mkl, Aug 10 '18 at 14:47

score 0 · Accepted Answer · answered Sep 04 '18 at 08:28

Finally, I found a another way to save my PDF files.

Now, I don't save them into database, I'm working with only 1 PDF file, and I'm using a dictionnary of PDF Fonts to re-use them instead of declare several time fonts and so don't increase the number of used fonts (even if this is the "same" font (with another subset)).

How to remove duplicate fonts in a PDF file using iText 7

1 Answers1