I am working with Pytesseract and would like to convert an HOCR output to a string. Of course, such a function is implemented into Pytesseract but I would like to know more about the possible strategies to get it done thx
from pytesseract import image_to_pdf_or_hocr
hocr_output = image_to_pdf_or_hocr(image, extension='hocr')