0
  • I using source code: https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/util/PrintTextLocations.java?revision=1709154&view=co

  • in now output is a character:

String[56.8,67.900024 fs=12.0 xscale=12.0 height=7.7100005 space=3.0 width=6.0]h

  • i want result is mutil character :

String[56.8,67.900024 fs=12.0 xscale=12.0 height=7.7100005 space=3.0 width=6.0]hello

Han IT
  • 11
  • 3
  • In 2.0, text extraction is more character-oriented than in 1.8. But even in 1.8 it isn't sure that you'd always get words as such. PDF content streams are often like this: [ ( ISO 19005-1:2005, Docum) 8 (e) -1 (nt m) 8 (a) -1 (nagem) 8 (e) -1 (nt \227 ) ] TJ . So you'll have to use your own logic if you want the position of a *word*. See also https://stackoverflow.com/questions/12354266/pdfbox-getting-words-locations-and-not-only-characters and https://stackoverflow.com/questions/11873801/using-pdfbox-to-determine-the-coordinates-of-words-in-a-document – Tilman Hausherr Feb 01 '16 at 09:35
  • Thanks, i think PDFBox no function get the position of words. i will try follow Your guide – Han IT Feb 01 '16 at 09:51

0 Answers0