0

I am now about to use PoDoFo to parse PDF.I have the source code pdofo-tools podofotxtextract available here : podofotxtextract file TextExtractor.cpp . I would change this code for get the text coordinates, after reading this post : PoDoFo extract text and reading Adobe specification, to retrieve the contact information I need to use the operator Tm, this operator is matrix. So I added in the source code a new case like this

... other case in if( bTextBlock ) ....
else if( strcmp( pszToken, "Tm" ) == 0 )
{
     std::cout << "I have matrix here Tm " <<  std::endl;                   
}
....

This code works but I do not know how to display the values of the matrix. Adobe specifies : enter image description here

If anyone can help me

Community
  • 1
  • 1
simon
  • 1,180
  • 3
  • 12
  • 33
  • *"how to display the values of the matrix"* - for which purpose? – mkl Aug 31 '16 at 09:58
  • Hum I've find the solution ! PoDoFo provide `GetReal()` this function can be use with `std::stack`, with this function I finaly get coordinates of text – simon Aug 31 '16 at 10:09

1 Answers1

0

I recommend you to read "PDF Succinctly" By Ryan Hodson (First google search takes you to it) and Acrobat (which was suggested in a similar question) https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf

First thing, you are missinterpreting how pdf are "coded". Podofo will give you variants and tokens. You are supposed to stack variants and process them once you find a token (except for a few cases where the token opens or closes a block, which won't have any variant).

By the time you reach the Tm token you should have 6 variants staked, which will correspond to the positions of the matrix you pasted.

Coyoteazul
  • 123
  • 1
  • 9