0

hi im working on pdf viewer. i want to extract all the contents of the pdf. will cgpdgscanner get all the contents of pdf??

the apple documentation is very brief. its difficult to implement with the explanations given. and lots of googlin also leads to nowhere.

so can someone explain the use of the purpose and use of the following:

1.CGPDFOperatorTableRef

2.CGPDFOperatorTableSetCallback

3.CGPDFScannerRef

4.CGPDFContentStreamRef

once al this is done how to view the data that is got after parsing.

thanks in advance.

pnuts
  • 58,317
  • 11
  • 87
  • 139
cancerian
  • 942
  • 1
  • 10
  • 18

2 Answers2

1

Its not a big deal to parse pdf content but what makes it more difficult is to highlight searched text in PDF.

For parsing do as posted on the below url.

http://www.random-ideas.net/posts/42

For exact reader get the below code (but it shows the clumsy logo)

https://github.com/mobfarm/FastPdfKit

Thanks

user869123
  • 257
  • 4
  • 15
  • hi thanks for replying. but the random idea link doesnt seem to work. can u post some code sample???? – cancerian Aug 02 '11 at 05:15
  • use archieve.org and try finding out random ideas url. Its been available in recent past. – user869123 Aug 03 '11 at 13:09
  • FastPdfKit does search and highlight. i need to do for touch and drag. and i should be able to save the highlighted text in the pdf. FastPdfKit doesnt do that. any idea for that??? – cancerian Aug 03 '11 at 16:03
0

The CGPDFScanner will parse a PDF graphic content stream (page content or form XObject content). This is very low level PDF, you have to know the PDF specification in order to interpret the results of the parsing. The CGPDFScanner will call a method of yours every time it encounters an operator that you are interested in. The CGPDFOperatorTable stores the list of operators you want to be notified about. If you want to extract all content you have to fill this table with all PDF graphic operators. Each operator is associated with a method (a callback) that is called when the scanner find the operator in the PDF content stream.
The CGPDFScannerRef is the PDF scanner and CGPDFContentStreamRef is a PDF content stream, a stream associated with a PDF object. The content of this stream depends on the PDF object this stream is associated with.

iPDFdev
  • 5,229
  • 2
  • 17
  • 18
  • hi thanks for the explanation. it makes sense now. can u tel me how to check the content of this stream in console?? any sample codes??? – cancerian Aug 02 '11 at 05:17
  • I'm not sure what you mean by "check the content of this stream in console". – iPDFdev Aug 02 '11 at 07:28
  • The content depends on the type of the object the stream is included in. If it is an image object, the stream is the image data. If it is a generic stream links to page's Contents entry, then you have the page content stream. Not all PDF objects have streams associated with them. You have to read the PDF specification in order to know what each object represents. – iPDFdev Aug 03 '11 at 07:34