Kindly find the link to the image in question here.1 I've tried using PyTesseract to achieve the intended objective. While it works well to extract the words, it doesn't pick the numbers to any degree of acceptable precision. In fact, it doesn't even pick the numbers I require, at all. I intend to design a program that picks up numbers from four particular locations in an image and stores them in a structured data variable (list/dictionary/etc.) and since I require to do this for a good 2500-odd screenshots via that program, I cannot manually pick the numbers I require, even if it begins to read them correctly. The following was the output returned while using PyTesseract (for the image talked about above).
`Activities Boyer STA
Candle Version 4.1-9 IUAC, N.Delhi - BUILD (Tuesday 24 October 2017 04:
CL-F41. Markers:
—
896 13) 937.0
Back
Total,
Peak-1
Lprnenea dais cinasedl
Ee
1511 Show State
Proceed Append to File`
The code used to produce this output was:
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files/Tesseract-OCR/tesseract.exe'
print(pytesseract.image_to_string(Image.open('C:/Users/vatsa/Desktop/Screenshot from 2020-06-15 21-41-06.png')))
Referring to the image, I'm interested in extracting the numbers present at those positions across all the screenshots, where 146.47, 915.16, 354.5 and 18.89 are present in this picture and probably save them as a list. How can I achieve such functionality using Python?
Also, upon opening the image in question with Google Docs(linked here) shows what a great job Google does to extract the text. Can an automated program do the job of using Google Docs to do this conversion and then scrape the desired data values as described before? Either approach towards solving the issue would be acceptable and any attempt at finding a solution would be highly appreciated.
[edit]: The question suggested in the comments section was really insightful, yet fell short of proving effective as the given code was unable to find the contours of the numbers in the image and therefore the model could not be trained.