I have a PDF containing several geometric objects (mostly lines) in different sizes and color. I want to extract them in the following form, e.g. for lines:
- (startx, starty)
- (endx, endy)
- width
- color
Optinal a "z" Position determining which object is drawn first. The language of my choice is C++ and I thought about PoDoFo, respectively PDFMM, as it should be more accessible. However I am total lost how to acess this information...
I found the following reference: PDF parsing in C++ (PoDoFo)
however I was not able to make the PdfTokenizer work. The Tokenizer.TryReadNextToken needs a InputStreamDevice object, and I do not know how to get it.
For example: I create a single page with just one line in pdfmm. And now I want to extract this information:
#include <pdfmm/pdfmm.h>
int main()
{
try {
PdfMemDocument document;
document.Load("test.pdf");
PdfPage* page = document.GetPages().CreatePage(PdfPage::CreateStandardPageSize(PdfPageSize::A4));
// Draw single line
PdfPainter painter;
painter.SetCanvas(page);
painter.GetGraphicsState().SetLineWidth(10);
painter.DrawLine(0, 0, page->GetRect().GetWidth(), page->GetRect().GetHeight());
painter.FinishDrawing();
// Loop over all token of page
PdfTokenizer token(true);
char* stoken = nullptr;
PdfVariant var;
PdfContentType type;
while (token.TryReadNextToken( ???? ,stoken,type)) {
}
}
catch (PdfError& err)
{
err.PrintErrorMsg();
return (int)err.GetError();
}
}
If anybody could push me in the correct direction, this would be awesome! And if somebody has a good documentation about the structure of a pdf and/or a good tutorial of pdfmm / PoDoFo, this would also highly appreciated...