3

I convert an image using ABBYY's OCR SDK:

CSafePtr<IFRDocument> frDocument = ...;
frDocument->AddImageFile( "C:\\test\\input.tif" );
frDocument->Process( 0 );
frDocument->Export( "C:\\test\\output.rtf", FEF_RTF, 0  );

But now I need to get the char bounding boxes and confidence levels, as well. I can get them from Tesseract so I assume it's possible with ABBYY's SDK as well.

How do I get the bounding boxes and confidence levels?

sashoalm
  • 75,001
  • 122
  • 434
  • 781

1 Answers1

4

I eventually found how to do it, you need to use the IPlainText::GetCharacterData().

GetCharacterData Method of the PlainText Object This method returns the information about all characters in the text as a set of arrays: the page numbers on which the characters are located, the coordinates of characters' rectangles, and characters' confidences.

Example:

CSafePtr<IPlainText> plainText;
frDocument->get_PlainText(&plainText);
SAFEARRAY *confidences, *pageNumbers, *leftBorders, *topBorders, *rightBorders, *bottomBorders, *isSuspicious;
plainText->GetCharacterData(&pageNumbers, &leftBorders, &topBorders, &rightBorders, &bottomBorders, &confidences, &isSuspicious);
sashoalm
  • 75,001
  • 122
  • 434
  • 781