3

while extracting font name from pdf i will get some junk characters followed by plus sign and then the font name with font style. i want to remove the junk characters.only for few pdf i get that junk characters. example:MMLPEO+RemingtonNoiseless

string curFont = renderInfo.GetFont().PostscriptFontName;
Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
pdp
  • 609
  • 9
  • 22

2 Answers2

3

For an explanation have a look at section 9.6.4 Font Subsets of the PDF specification ISO 32000-1:2008:

For a font subset, the PostScript name of the font — the value of the font’s BaseFont entry and the font descriptor’s FontName entry — shall begin with a tag followed by a plus sign (+). The tag shall consist of exactly six uppercase letters; the choice of letters is arbitrary, but different subsets in the same PDF file shall have different tags.

EXAMPLE EOODIA+Poetica is the name of a subset of Poetica®, a Type 1 font.

Thus, those characters aren't junk, they are a tag.

Community
  • 1
  • 1
mkl
  • 90,588
  • 15
  • 125
  • 265
2

The "junk" characters indicate that the font isn't embedded completely. You'll find names such as ABC123+RemingtonNoiseless, XYZ456+RemingtonNoiseless, etc... meaning that there may be different subsets of the same font inside the PDF.

For an explanation have a look at section 9.6.4 Font Subsets of the PDF specification ISO 32000-1:2008:

For a font subset, the PostScript name of the font — the value of the font’s BaseFont entry and the font descriptor’s FontName entry — shall begin with a tag followed by a plus sign (+). The tag shall consist of exactly six uppercase letters; the choice of letters is arbitrary, but different subsets in the same PDF file shall have different tags.

EXAMPLE EOODIA+Poetica is the name of a subset of Poetica®, a Type 1 font.

In other words: these characters aren't merely "junk". If you want to remove them, that's a no-brainer, just use the appropriate string manipulation method, but be aware that removing them throws away information that may be useful in some contexts.

Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
  • *meaning that there are different subsets of the same font inside the PDF* — not necessarily, there merely *may* be. – mkl May 16 '13 at 07:02
  • 2
    Well, it also could be called nitpicking... ;) – mkl May 16 '13 at 07:44