0

I am using this piece of code to read through a pdf file but I am not sure how to extract font style information (like whether it's bold or not) from the text. This library is fairly old so if there is a newer way of doing this, suggestions are welcome.

System.out.println("Reading pdf " + filename + ".pdf...");

PDDocument document = PDDocument.load(file);

PDFTextStripper pdfStripper = new PDFTextStripper();

String text = pdfStripper.getText(document);
Tony
  • 305
  • 1
  • 3
  • 14
  • 1
    See [this](https://stackoverflow.com/a/22825237/231316). The short is that is is complicated because "bold" doesn't really it exist, instead it is just a different font that happens to look darker/thicker. – Chris Haas Jan 21 '20 at 19:57
  • Thanks! But that would mean that I read everything and then just the bold right? Basically I want to read everything as plane text but store the bold text in a data structure. I want to parse exams with solutions that I have (the solutions are in bold) and then display the questions in the console and ask the user to answer them. I will have the bold option stored in the object so i can check if the user was correct but I have to display everything, not just the correct answer – Tony Jan 21 '20 at 20:08
  • What is really means is that `PDFTextStripper` doesn't support the concept of bold, italic, font size, color, etc., it really just gets you text. If you want to get any of that extended information, you'll need to write your own logic. I've [personally written](https://stackoverflow.com/a/6884297/231316) that logic for other PDF libraries and you might be able to find examples for pdfbox, too, but I can tell you that it is a little bit of an involved process. – Chris Haas Jan 21 '20 at 20:22

0 Answers0