I'm trying to extract text from a particular section of a PDF. If I know the X,Y co-ordinates of the area, I'm able to extract the text. But I'm unable to get the co-ordinates of the selected area from PDF. Kindly help me If anyone tried this already.
Asked
Active
Viewed 4,568 times
8
-
Can you explain what you mean. You say that you have X,Y coordinates. You know how to extract text. But which X,Y coordinates do you further need? This looks like a duplicate of http://stackoverflow.com/questions/23909893/getting-coordinates-of-string-using-itextextractionstrategy-and-locationtextextr/ – Bruno Lowagie Jun 25 '14 at 05:38
-
Actually I've hard coded the X,Y co-ordinates to extract text. What I need is, When I display the PDF in the browser and selects an area, I need the co-ordinates of the selection. One more doubt I have is, If we get co-ordinates from the browser, Will it be match with Original PDF's co-ordinates. – Sasikumar Jun 25 '14 at 05:58
-
2The coordinate system in the browser is different from the coordinate system in PDF. There will be differences. This question seems to move more in the direction of pdf.js than in the direction of iText. – Bruno Lowagie Jun 25 '14 at 07:14
-
2I'm with @BrunoLowagie, this is a client-side problem so iText and PDFbox aren't involved. PDF.js parses a PDF and renders it to the HTML canvas. Knowing that you can just monitor the HTML canvas and ignore the PDF completed. This answer has a sample that might help: http://stackoverflow.com/a/15654197/231316 – Chris Haas Jun 25 '14 at 17:00
-
Thanks BrunoLowagie and Chris Haas. PDF.js has what I expected... – Sasikumar Jun 27 '14 at 09:10