3

I am trying to extract numbers from in game screenshots.

Text

I'm trying to extract:

98
3430
5/10

from PIL import Image
import pytesseract 
image="D:/img/New folder (2)/1.png"
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
text = pytesseract.image_to_string(Image.open(image),lang='eng',config='--psm 5')
print(text)

output is gibberish

‘t hl) keteeeees
ek pSlaerenen
JU) pgrenmnreserenny
Rates B
d dali eas. 5
cle aM (Sores
|, S| pgranmrerererecons
a cee 3
pea 3
oS :
(geo eenee
ey
=
es A
Cesar
  • 181
  • 1
  • 1
  • 9
  • Can you upload the image? – Arun Mar 08 '20 at 10:48
  • The image is at https://i.imgur.com/QSOcVRF.png – Cesar Mar 08 '20 at 11:01
  • You are going to need a stronger pre-processing pipeline than that to be able to correctly detect the numbers. You must segment the characters as cleanly as possible, also, the text is warped, you will need to unwarp it. – stateMachine Mar 09 '20 at 02:39

2 Answers2

3

okay, so I tried changing it into grayscale, reverse contrast or use different treshold, but it all seems to be fairly inaccurate. The issue seems to be the tilted and smaller numbers. You do not happen to have any hiher res image? Most accurate I could get was the following code.

import cv2
import pytesseract
import imutils

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
img = cv2.imread('D:/img/New folder (2)/1.png') #test.png is your original image
img = imutils.resize(img, width=1400)
crop = img[340:530, 100:400]

data = pytesseract.image_to_string(crop,config=' --psm 1 --oem 3  -c tessedit_char_whitelist=0123456789/')
print(data)

cv2.imshow('crop', crop)
cv2.waitKey()

Otherwise I recommend one of these methods as described in the similar question or in this one.

Kokokoko
  • 452
  • 1
  • 8
  • 19
-1

if the text is surrounded with the designs, tesseract suffers a lot

insted of tesseract try using findcontours in opencv (after little blurring, dilating)

you will get bounding boxes, then it might cover that text also

lnx
  • 318
  • 2
  • 12