0

Team,

I have to validate a flattened pdf as part of a requirement. This pdf has checkboxes. I used Apache PDFBOX library to read the contents of this PDF. It is only reading the text but not identifying the checkboxes. Please find attached a screenshot of a similar pdf file that i am using Flat PDF with Checkbox :

enter image description here

Can you please provide me any approach to identify and validate these checkboxes

Code Snippet used

        PDFTextStripper stripper = new PDFTextStripper() ;
        PDDocument document = new PDDocument() ;            
        document = PDDocument.load(new File("D:\\test.pdf"));
        stripper.setStartPage(1);
        stripper.setEndPage(1);
        stripper.setSortByPosition(true);
        pdfTextContent = stripper.getText(document);
        System.out.println(pdfTextContent);
Naresh
  • 16,698
  • 6
  • 112
  • 113
  • Might be vector graphics. In that case, you'll have to collect the lines: https://stackoverflow.com/questions/38931422/pdfbox-2-0-2-calling-of-pagedrawer-processpage-method-caught-exceptions To find out more, please share a file that has some boxes checked and some not. – Tilman Hausherr Feb 27 '19 at 07:08
  • Might be other things, too. Thus, please definitively share a representative pdf. – mkl Feb 27 '19 at 09:23
  • Thanks for your response Tilman Hausherr/MKL .. I am not able to upload the original PDF as it has client sensitive data. I downloaded a similar pdf from the internet, it has some boxes (but without the cross mark). Please find the pdf in the below lik: https://drive.google.com/file/d/1yQLnufhJ42-QckyqNFi0I6vXhOorVMW1/view?usp=drivesdk – Magesh Ram Feb 27 '19 at 10:13
  • 1
    As far as I can see that pdf does not contain flattened checked check boxes. As you want to recognize flattened checked check boxes, that pdf does not yet help. Furthermore, *"a similar pdf from the internet"* only helps if its check boxes are constructed internally very similarly, it does not suffice that they look similar. – mkl Feb 27 '19 at 13:56

0 Answers0