How to retrieve font type style attributes from pdf using pdfbox
Asked
Active
Viewed 4,034 times
2
-
Double? http://stackoverflow.com/questions/6939583/how-to-extract-font-styles-of-text-contents-using-pdfbox – Kim Jun 04 '12 at 12:22
-
Kim thanks for the reply... I tried this getting java.util.EmptyStackException at java.util.Stack.peek(Stack.java:85) at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:601) at pdf22box.main(pdf22box.java:13) – satish john Jun 05 '12 at 04:23
-
However, I am getting the text from the pdf – satish john Jun 05 '12 at 04:24
-
Getting following result after trying with getFonts. Could you help me understand the content {TT1=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@74b2002f, TT2=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@522a4983} {TT4=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@79f6f296, TT3=org.apache.pdfbox.pdmodel.font.PDTrueTypeFont@43b09468} – satish john Jun 05 '12 at 05:09
-
What I see are the objects and it's address. I guess you need to read out the content of those objects (aka by using it properties, like the name of the font etc). – Kim Jun 05 '12 at 12:14
1 Answers
1
If you want to get the font of a single character in the pdf document, you can call textPosition.getFont().getFontDescriptor().getFontName()
, where textPosition is a instance of the class TextPosition.
All characters of a PDF document are related to TextPosition objects.
You can get the TextPosition objects of a PDF document by overriding the processTextPosition(TextPosition t)
method of PDFTextStripper or with the getCharactersByArticle()
method of PDFTextStripper.
i.e. for latter - extend the PDFStripper class like this:
public class MyPDFTextStripper extends PDFTextStripper {
public MyPDFTextStripper() throws IOException {
super();
}
public Vector<List<TextPosition>> myGetCharactersByArticle() {
return getCharactersByArticle();
}
}
... to get the list of TextPositions for a single page use:
MyPDFTextStripper stripper = new MyPDFTextStripper();
PDDocument doc = PDDocument.load(new File(filename));
stripper.setStartPage(pageNr+1);
stripper.setEndPage(pageNr+1);
stripper.getText(doc);
Vector<List<TextPosition>> list = stripper.myGetCharactersByArticle();
... and finally to get the font for a single character just type:
textPosition.getFont().getFontDescriptor().getFontName()

matthiasboesinger
- 418
- 6
- 15