0

I've used PDFTextStripperByArea to provide the text contained by a rectangle, however I'm looking to do the reverse (given a string & a PDPage: provide the rectangle(s) where the string is seen).

My usage / understanding of PDFBox is pretty minimal / basic @ this point. From looking @ the example pointed to from https://stackoverflow.com/a/35743173/64696 it looks like it is possible to grab the coordinates of individual characters, so if this is the best option I guess that the road forward for me is to write something that strings these characters together for comparison...

Any other options available that I'm unaware of ? Suggestions appreciated :-)

Community
  • 1
  • 1
Dave Carpeneto
  • 1,042
  • 2
  • 12
  • 23
  • It can be difficult since the characters can be written in any order - There's no guarantee that the characters in your String will have been written in succession, so character-by-character parsing can easily fail to find the String. – nickb Jan 17 '17 at 20:46
  • @nickb it is possible to sort, there is an option. – Tilman Hausherr Jan 17 '17 at 21:52
  • Dave, the code from [this answer](http://stackoverflow.com/a/35987635/1729265) might help you. Depending on the strings you are looking for, though, you'll have to improve the code somewhat as it currently only searches inside the text forwarded in a single `writeString` call which at most is a line, more often, though, only a piece of it. – mkl Jan 18 '17 at 08:35
  • @TilmanHausherr The sorting is primitive IIRC, I believe it's based on absolute character coordinates and can end up corrupting the text when you have any sort of complex page layout - headers, footers, columns, image captions, callouts, etc. – nickb Jan 18 '17 at 13:55
  • @mkl: you have just saved me a ton of time. Your answer to the other question is working for the test cases I'm running against (single line is perfect for my needs), and is infinitely better than what I was starting to do. Thanks for this :-) – Dave Carpeneto Jan 19 '17 at 15:58
  • Ok, so your question essentially is a duplicate of that question? I'll mark it as such. – mkl Jan 19 '17 at 18:24

0 Answers0