0

There are lots of ways to extract text from pdfs, and to extract comments from pdfs. But are there any ways to extract the text+comments together from pdf files? So that the comment associated with each segment of text is clear.

So far, I have been able to do this using google docs: Export Google Docs comments into Google Sheets, along with highlighted text? but not using pdfs. Converting the pdf to a docx messes up the formatting very badly, so it doesnt seem to be a viable option.

  • The [LEADTOOLS PDF SDK](https://www.leadtools.com/sdk/pdf) has the [PDFDocument.ParsePages](https://www.leadtools.com/help/sdk/dh/pdf/pdfdocument-parsepages.html) method which extracts all PDF object from a given PDF, including text and comment annotations. You can then compare the properties of each extracted [PDFAnnotation](https://www.leadtools.com/help/sdk/dh/pdf/pdfannotation.html) object and see whether it is a comment and how its bounds and position relates to the extracted text object. (Disclaimer: I am an employee of the vendor) – Hussam Barouqa May 04 '23 at 15:29

0 Answers0