0

Kindly find the link to the image in question here.1 I've tried using PyTesseract to achieve the intended objective. While it works well to extract the words, it doesn't pick the numbers to any degree of acceptable precision. In fact, it doesn't even pick the numbers I require, at all. I intend to design a program that picks up numbers from four particular locations in an image and stores them in a structured data variable (list/dictionary/etc.) and since I require to do this for a good 2500-odd screenshots via that program, I cannot manually pick the numbers I require, even if it begins to read them correctly. The following was the output returned while using PyTesseract (for the image talked about above).

`Activities Boyer STA

Candle Version 4.1-9 IUAC, N.Delhi - BUILD (Tuesday 24 October 2017 04:















CL-F41. Markers:
—
896 13) 937.0
Back
Total,
Peak-1
Lprnenea dais cinasedl
Ee
1511 Show State

Proceed Append to File`

The code used to produce this output was:

try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files/Tesseract-OCR/tesseract.exe'

print(pytesseract.image_to_string(Image.open('C:/Users/vatsa/Desktop/Screenshot from 2020-06-15 21-41-06.png')))

Referring to the image, I'm interested in extracting the numbers present at those positions across all the screenshots, where 146.47, 915.16, 354.5 and 18.89 are present in this picture and probably save them as a list. How can I achieve such functionality using Python?

Also, upon opening the image in question with Google Docs(linked here) shows what a great job Google does to extract the text. Can an automated program do the job of using Google Docs to do this conversion and then scrape the desired data values as described before? Either approach towards solving the issue would be acceptable and any attempt at finding a solution would be highly appreciated.

[edit]: The question suggested in the comments section was really insightful, yet fell short of proving effective as the given code was unable to find the contours of the numbers in the image and therefore the model could not be trained.

Vatsal
  • 1
  • 1
  • Does this answer your question? [Simple Digit Recognition OCR in OpenCV-Python](https://stackoverflow.com/questions/9413216/simple-digit-recognition-ocr-in-opencv-python) – Stalin Gino Jun 17 '20 at 12:00
  • Thanks for the link. I tried this and a few other example codes employing this method, but the program isn't even recognizing (or drawing those red boxes) around the numbers in the image. So I can't even train the model using this, let alone test it. – Vatsal Jun 17 '20 at 12:26
  • Have you tried compressing the image or reduce quality since the screenshot possibly be too perfect (pixelated digits) for the model?! – Stalin Gino Jun 18 '20 at 08:43
  • I did and someone suggested that the issue with those codes is that they won't work if the image has both alphabets and numbers, which is the case for me. (I even clicked the picture of the screen and tried using it for the same.) Could the second approach, mentioned in the question, be tenable? – Vatsal Jun 19 '20 at 10:29

0 Answers0