I want to extract a number from an image. I am using Tesseract OCR with Python to extract the number. But the tesseract OCR is not functioning properly. The image is of the following format: Image
The Text is in Arial Font and the font size is 80. The code that I am using is following:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = "C:\\Program Files\\Tesseract-OCR\\tesseract.exe"
def process_image(iamge_name, lang_code):
return pytesseract.image_to_string(Image.open(iamge_name), lang=lang_code)
def print_data(data):
print(data)
def main():
data_eng = process_image("test.jpg", "eng")
print_data(data_eng)
if __name__ == '__main__':
main()
Using this code, Tesseract is not able to detect the number. There are around 2,00,000 images from which I need to extract the number. It would be really helpful if someone can give me a workaround for the same. Any help is appreciated.
Thanks in Advance