I have a simple program (code from the documentation of the docTR library) that recognizes text in a pdf file. If the text is perfectly aligned, then there are no problems with text recognition, but if the document is rotated to the right or left, then problems begin with text recognition.
I may receive documents that are not only rotated exactly 90,180 or 270 degrees. Crooked scanned documents can come rotated in any angle (as in the pictures above).
I would like with your help to find a solution that will help me rotate the table / text (or the whole pdf) in my pdf straight, for easy text recognition, as in the picture below.
Perhaps there are already similar solutions, but I have not found them yet. I would be grateful if you point me to existing solutions or help me write code with my own solution.
from doctr.io import DocumentFile
from doctr.models import ocr_predictor
ocr = ocr_predictor(pretrained=True)
doc = DocumentFile.from_pdf("my/path.pdf")
result = ocr(doc)
result.show(doc)