0

I have looked through many tutorials and usually stack users trow links to the pdfkitten, but as I've tested it I have not satisfied with result. So the search does not work with multiply word and etc.

So what I am looking for I need to get all words from the pdf page and highlight it if the words cross some rectangle.

Matrosov Oleksandr
  • 25,505
  • 44
  • 151
  • 277
  • Could you ever found a solution to this? If yes, please add it as an answer so it can be helpful. Thanks! – Hemang May 31 '17 at 03:53

1 Answers1

2

I used PDFKitten for the same.

  • What I did was while scanning the PDF - Identify the words separated by spaces.
  • Save the RenderingState(Model in PDFKitten code)word is encountered save that word in a model with it's current RenderingState (Model in PDFKitten code) which will be initial state. When the complete word is found(space separated) again save the current RenderingState as final state.
  • The code for converting RenderingState to actual view's frame using above initial state and final state, is present in PDFKitten. You can refer to that code.
  • apply current media box transform to frame.
  • And finally don't forget to convert resulted frame into user's co-ordinate system. Otherwise you will observe the reverse effect.
Swaroop
  • 501
  • 4
  • 18
  • 1
    However there is problem in CMap parsing of PDFKitten. It throws away many mapping. You may want to fix that first so that you will have correct character mapping. – Swaroop May 08 '15 at 08:02
  • Hi Swaroop, can you please share the code which you have explained here? – Hemang May 31 '17 at 09:05
  • Unfortunately no. Its a commercial app, so can't share the code. – Swaroop Jun 03 '17 at 02:00