how to get the position a text in PDFBox ver 2.0?

Asked Feb 01 '16 at 09:22

Active Feb 01 '16 at 09:22

Viewed 1,733 times

I using source code: https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/PrintTextLocations.java?revision=1709154&view=co
in now output is a character:

String[56.8,67.900024 fs=12.0 xscale=12.0 height=7.7100005 space=3.0 width=6.0]h

String[56.8,67.900024 fs=12.0 xscale=12.0 height=7.7100005 space=3.0 width=6.0]hello

asked Feb 01 '16 at 09:22

Han IT

In 2.0, text extraction is more character-oriented than in 1.8. But even in 1.8 it isn't sure that you'd always get words as such. PDF content streams are often like this: [ ( ISO 19005-1:2005, Docum) 8 (e) -1 (nt m) 8 (a) -1 (nagem) 8 (e) -1 (nt \227 ) ] TJ . So you'll have to use your own logic if you want the position of a *word*. See also https://stackoverflow.com/questions/12354266/pdfbox-getting-words-locations-and-not-only-characters and https://stackoverflow.com/questions/11873801/using-pdfbox-to-determine-the-coordinates-of-words-in-a-document – Tilman Hausherr Feb 01 '16 at 09:35
Thanks, i think PDFBox no function get the position of words. i will try follow Your guide – Han IT Feb 01 '16 at 09:51

0 Answers0