I have an application, that extracts headings out of pdf files. The documents, that the application is supposed to work with, all have more or less coherent structure and formatting, in fact, telling if a text chunk is bold or not, is very important. Recently I came across a bunch of files, where some chunks visually appear bold, but do not have "bold" piece in string representation of font. The following SO thread how can i get text formatting with iTextSharp helped me to understand, that there is one more way of making text appear bold. However in my case calling GetTextRenderMode() does not help either, as it returns 0 as if it were normal text. So are there any other ways of making text appear bold, and is it possible to detect it using iTextSharp ?
-
Please share the PDF in question for analysis. There are additional ways to make glyphs appear bold, e.g. double-printing with a tiny offset. – mkl Jan 21 '15 at 11:11
-
A single page, that gives good idea of what I was writing about : https://www.dropbox.com/sh/thhbp3qy8hpybxe/AABJtS5UkXE32V_kBFu_uPQea?dl=0 . There are 2 headings, which appear as bold, but have font names as follows "JOJJAH+TT116t00", and GetTextRenderMode() returns 0 for all pieces of both. – user2082616 Jan 21 '15 at 11:59
1 Answers
You are making the assumption that the font inside your PDF file knows if it's bold or not. Let's take a look inside and check if your assumption is correct.
This is what the subset JOJJAH of the font TT116t00 looks like when you look at the internals of the PDF file you have shared:
We see that the font is of subtye /TrueType
, we see that the /ItalicAngle
is 0, and... we see that the 3rd bit of the /Flags
is set. Let's check the PDF reference to find out what this tells us:
I quote:
The font contains glyphs outside the Adobe standard Latin character set.
The glyphs look bold, because the glyphs are drawn in a way that they appear bold. You see the font as bold because you are human. However, when a machine looks at the font, it doesn't have a clue that the font is bold. A machine just follows the instructions stored in the /FontFile2
stream.
In short: iTextSharp doesn't have any indications that the font is bold.

- 75,994
- 9
- 109
- 165
-
1Thanks a lot! I totally missed the point, that it can be a matter of non-standard glyphs, that already look like bold. At least now I am sure, that there is no simple way of picking out everything, that LOOKS like bold – user2082616 Jan 21 '15 at 13:44