I'm trying to get text from pdf using Square Annotation
. I use below code to extract text from PDF using PDFBOX.
CODE
try {
PDDocument document = null;
try {
document = PDDocument.load(new File("//Users//" + usr + "//Desktop//BoldTest2 2.pdf"));
List allPages = document.getDocumentCatalog().getAllPages();
for (int i = 0; i < allPages.size(); i++) {
PDPage page = (PDPage) allPages.get(i);
Map<String, PDFont> pageFonts = page.getResources().getFonts();
List<PDAnnotation> la = page.getAnnotations();
for (int f = 0; f < la.size(); f++) {
PDAnnotation pdfAnnot = la.get(f);
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
PDRectangle rect = pdfAnnot.getRectangle();
float x = 0;
float y = 0;
float width = 0;
float height = 0;
int rotation = page.findRotation();
if (rotation == 0) {
x = rect.getLowerLeftX();
y = rect.getUpperRightY() - 2;
width = rect.getWidth();
height = rect.getHeight();
PDRectangle pageSize = page.findMediaBox();
y = pageSize.getHeight() - y;
}
Rectangle2D.Float awtRect = new Rectangle2D.Float(x, y, width, height);
stripper.addRegion(Integer.toString(f), awtRect);
stripper.extractRegions(page);
PrintTextLocation2 prt = new PrintTextLocation2();
if (pdfAnnot.getSubtype().equals("Square")) {
testTxt = testTxt + "\n " + stripper.getTextForRegion(Integer.toString(f));
}
}
}
} catch (Exception ex) {
} finally {
if (document != null) {
document.close();
}
}
} catch (Exception ex) {
}
By using this code, I am only able to get the PDF text. How do I do to get the font information like BOLD ITALIC together within the text. Advice or references are highly appreciated.