7

I am using pdfbox library to extract text contents from pdf file.I would able to extract all the text,but couldn't find the method to extract font styles.

alexblum
  • 2,198
  • 16
  • 13
Master Stroke
  • 5,108
  • 2
  • 26
  • 57
  • 1
    This helped me to find the font information- http://stackoverflow.com/questions/21705961/get-font-of-each-line-using-pdfbox – EvilInside Nov 17 '14 at 22:57

3 Answers3

17

This is not the right way to extract font. To read font one has to iterate through pdf pages and extract font as below:

PDDocument  doc = PDDocument.load("C:/mydoc3.pdf");
List<PDPage> pages = doc.getDocumentCatalog().getAllPages();
for(PDPage page:pages){
    Map<String,PDFont> pageFonts=page.getResources().getFonts();
}
bcoughlan
  • 25,987
  • 18
  • 90
  • 141
Harpreet
  • 194
  • 1
  • 3
0
File file = new File("sample.pdf");
        PDDocument document = PDDocument.load(file);

        for (int i = 0; i < document.getNumberOfPages(); ++i)
        {
            PDPage page = document.getPage(i);
            PDResources res = page.getResources();
            for (COSName fontName : res.getFontNames())
            {
                PDFont font = res.getFont(fontName);
                System.out.println(font.getName());

            }
        }
Walid Bousseta
  • 1,329
  • 2
  • 18
  • 33
0
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;
public class pdf2box {
    public static void main(String args[])
    {
        try
        {
    PDDocument pddDocument=PDDocument.load("table2.pdf");
    PDFTextStripper textStripper=new PDFTextStripper();
    System.out.println(textStripper.getText(pddDocument));
    textStripper.getFonts();



    pddDocument.close();
        }
        catch(Exception ex)
        {
        ex.printStackTrace();
        }
    }


}