Generate hOCR from Microsoft Computer Vision OCR

Question

While I can obtain the OCR data in JSON format, I would like to get the output as hOCR. I found this resource: https://learn.microsoft.com/en-us/samples/azure-samples/azure-search-power-skills/azure-hocr-generator-sample/. It's not clear to me how to implement this in relation to the Python script in the tutorial.

Does anybody know how to convert the JSON output to hOCR?

which version of Read API are you using.The 3.0 API has: Improved accuracy, Confidence scores for words, Support for both Spanish and English with the additional language parameter, A different output format — Ram, Jun 01 '20 at 12:29
I've been using 2.1 and will look into 3.0. Still hoping someone can offer an hOCR conversion. — paratext, Jun 01 '20 at 17:03
would it help if I gave you a parser to convert JSON to dataframe which has all the words, their coordinates (x,y,w,h) and confidence as columns? — Sreekiran A R, Jun 09 '20 at 17:22
Thanks Sreekiran but I've written something similar to that for other use cases. Now, I'm looking for either hOCR or ALTO formats so that the text can be edited in a human readable way in Oxygen XML Editor. — paratext, Jun 19 '20 at 15:12

0 Answers0