I have been trying to extract (x,y) coordinates from a PDF document using PHP. So far, no luck. Is it possible to use GhostScript? If so, how do I accomplish that? The solution doesn't necessarily have to be PHP.
Asked
Active
Viewed 1,201 times
0
-
2What sort of x,y coordinates? The position of a certain piece of content in the document? Some coordinates that are described in the rendered text of the document? – Quentin May 20 '13 at 08:25
-
X, y coordinates to search text and image in a PDF book or document.. Example of output x, y coordinates: 14ccc:720:7818:0:111:59;118;176... – blash May 20 '13 at 08:36
-
I still don't understand what the *input* is. – Quentin May 20 '13 at 08:40
-
sorry.. i didn't make it clear enough! please i'm using images to display pages of a book, and the co-ordinates of each text item of the book was generated with some tool. I need to use these co-ordinates to highlight a text being searched for in the book – blash May 20 '13 at 08:49
-
1So you have a PDF containing pictures of text and you want to OCR that text and identify the locations of matched text within the images? Ouch. – Quentin May 20 '13 at 08:51
-
i converted the pdf pages to images using ghostscript, i'm using these images as pages of book. I've not heard of a way to search an image for a text. what i intend doing is storing the coordinates of all text in these book in a text file and searching the text file for a searched phrase and highlighting where the searched phrase is found on the image – blash May 20 '13 at 08:58
-
1So in essence you have some PDF files and converted them to some image format, and provide these image files to someone. Now you want to allow that someone to search for some text. As searching for text looks easier to you in PDF, you want to search the PDFs, get the positions of the found occurances there, then mark the images at those positions, and provide the images with marks. (Or alternatively extract all text with positions from the PDF first and then use these informations to achieve the same.) If that's the case, I could suggest some Java libraries allowing such text extraction tasks. – mkl May 20 '13 at 09:09
-
can i achieve this using javascript? because its a browser based app... – blash May 20 '13 at 09:13
-
When you say *JavaScript* and *browser based app*, you seem to imply that the PDFs in question are available on the client. What you said before, made me assume that you tried to *not* provide the PDFs to the clients. Thus, please clarify. That being said, I'm not aware of JavaScript PDF text parsing code, but I'm not really into Javascript anyways. – mkl May 20 '13 at 09:42