How to extract font styles of text contents using pdfbox?

Question

I am using pdfbox library to extract text contents from pdf file.I would able to extract all the text,but couldn't find the method to extract font styles.

This helped me to find the font information- http://stackoverflow.com/questions/21705961/get-font-of-each-line-using-pdfbox — EvilInside, Nov 17 '14 at 22:57

score 17 · Accepted Answer · edited May 13 '13 at 13:39

17

This is not the right way to extract font. To read font one has to iterate through pdf pages and extract font as below:

PDDocument  doc = PDDocument.load("C:/mydoc3.pdf");
List<PDPage> pages = doc.getDocumentCatalog().getAllPages();
for(PDPage page:pages){
    Map<String,PDFont> pageFonts=page.getResources().getFonts();
}

edited May 13 '13 at 13:39

bcoughlan

25,987
18
90
141

answered Mar 02 '12 at 18:12

Harpreet

194
1
3

How can I set fetched font for my android textview? – lazydevpro Oct 25 '21 at 17:02

score 0 · Answer 2 · answered Jul 19 '18 at 10:41

File file = new File("sample.pdf");
        PDDocument document = PDDocument.load(file);

        for (int i = 0; i < document.getNumberOfPages(); ++i)
        {
            PDPage page = document.getPage(i);
            PDResources res = page.getResources();
            for (COSName fontName : res.getFontNames())
            {
                PDFont font = res.getFont(fontName);
                System.out.println(font.getName());

            }
        }

score 0 · Answer 3 · answered Aug 11 '11 at 06:00

0

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;
public class pdf2box {
    public static void main(String args[])
    {
        try
        {
    PDDocument pddDocument=PDDocument.load("table2.pdf");
    PDFTextStripper textStripper=new PDFTextStripper();
    System.out.println(textStripper.getText(pddDocument));
    textStripper.getFonts();



    pddDocument.close();
        }
        catch(Exception ex)
        {
        ex.printStackTrace();
        }
    }


}

answered Aug 11 '11 at 06:00

2

This gives me an empty map while Harpreet's answer gives me the expected output – bcoughlan May 13 '13 at 13:39
1

PDFTextStripper does not have `getFonts()` method in pdfBox 2.04. – Om Prakash Mar 27 '17 at 04:28

How to extract font styles of text contents using pdfbox?

3 Answers3

Linked