2

I am using pdf.js. Fetching the Text I get blocks with font info

Object { 
     str: "blabla", 
     dir: "ltr", 
     width: 191.433141, 
     height: 12.546, 
     transform: Array[6], 
     fontName: "g_d0_f2" 
}

Is it possible to get somehow more information about g_d0_f2.

Paflow
  • 2,030
  • 3
  • 30
  • 50
  • Via undocumented API, page obj has commonObjs property (https://github.com/mozilla/pdf.js/blob/master/examples/svgviewer/viewer.js#L29) -- you will get ttf/otf data, you can parse it and find all information you will need :) – async5 Nov 17 '16 at 14:09

1 Answers1

1

Notice the PDF.js getTextContent will not and not suppose to match glyphs in PDFs. The PDF32000 specification has two different algorithms for text display and extraction. Even if you can lookup font data in the page.commonObjs, it might not be really helpful for extracted text content display due to glyphs encoding mismatch.

The page's getTextContent is doing text extraction and getOperatorList gets (glyph) display operators. See how src/display/svg.js renderer displays glyphs.

async5
  • 2,505
  • 1
  • 20
  • 27