0

I am new to pytesseract. I want to extract the user ID's from the below image

Image

The code I am using is:

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:\Users\80141219\AppData\Local\Programs\Tesseract- OCR\tesseract.exe'

image = cv2.imread(r'C:\Desktop\dormancyIssue\testImage.jpg', 0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.waitKey()

and the output is:

wecy| H+ op Et >A EEE
@ Fle] x |
Fite Adion View WN (Gencal
| ale] xX .
x x & ‘
oraputer Manage
4B System Tools TT -
Gf Event Viewer
> gil Shared Folder sities
4B Local Users arg | Members:
To Users Bor 109033
3 Groups | | Soser5405
» @ Performance | | SPs0nss658
Bl device Menagy | | SE70z1611
> ap Windows Senff | | SE 7102
z Bons
Disk Manage
> iy Services and App}
Guages et goin raven
pe) ts) Cerone] ret ster
B& * & &°e «hs

I'm not even sure where some of the data in the output is coming from. Note I have also tried to crop the image to only include the ID's but to no avail.

I'm wondering if someone might have a solution or point me in the right direction.

Thanks!!

PythonBeginner
  • 463
  • 4
  • 18
  • 2
    Small font, low resolution, severe JPG artifacts... You'll need a better input image. I doubt, that there are reasonable pre-processing steps to identify the numbers in question. – HansHirse Jul 02 '21 at 07:59
  • @HansHirse okay thank you, I'll look into getting a better image. – PythonBeginner Jul 02 '21 at 08:04

1 Answers1

1

I see your ID is consist of only number. Here is a solution for tesseract to gather number information.

https://stackoverflow.com/a/46589648/7383731

Dharman
  • 30,962
  • 25
  • 85
  • 135
Rizquuula
  • 578
  • 5
  • 15
  • Haven't head of whitelisting numbers. That's interesing. I've heard that setting it to a language that doesn't hunt for English characters can help, such as Chinese. But yea, def'ly don't screencap with .JPG - use `.PNG` or .TIFF instead. – Doyousketch2 Jul 02 '21 at 08:48