Any suggestions on converting these images to text? I'm using pytesseract and it's working wonderfully in most cases except this. Ideally I'd read these numbers exactly. Worst case I can just try to use PIL to determine if the number to the left of the '/' is a zero. Start from the left and find the first white pixel, then
from PIL import Image
from pytesseract import image_to_string
myText = image_to_string(Image.open("tmp/test.jpg"),config='-psm 10')
myText = image_to_string(Image.open("tmp/test.jpg"))
The slash in the middle causes issues here. I've also tried using PIL's '.paste' to add lots of extra black around the image. There might be a few other PIL tricks I could try, but i'd rather not go that route unless I have to.
I tried using config='-psm 10' but my 8's were coming through as ":" sometimes, and random characters other times. And my 0's were coming through as nothing.
Reference to: pytesseract don't work with one digit image for the -psm 10
_____________EDIT_______________ Additional samples:
So I'm doing some voodoo conversions that seem to be working for now. But looks very error prone:
def ConvertPPTextToReadableNumbers(text):
text = RemoveNonASCIICharacters(text)
text = text.replace("I]", "0")
text = text.replace("|]", "0")
text = text.replace("l]", "0")
text = text.replace("B", "8")
text = text.replace("D", "0")
text = text.replace("S", "5")
text = text.replace(".I'", "/")
text = text.replace(".I", "/")
text = text.replace("I'", "/")
text = text.replace("J", "/")
return text
Ultimately generates:
ConvertPPTextToReadableNumbers return text = 18/20
ConvertPPTextToReadableNumbers return text = 0/5
ConvertPPTextToReadableNumbers return text = 10/10
ConvertPPTextToReadableNumbers return text = 20/20