0

sample form

I'm learning AI/ML and trying to get text from this sample form.

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:\Users\Pranav\AppData\Local\Programs\Tesseract-OCR\tesseract.exe'

image = cv2.imread('image2.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)

x,y,w,h = 393, 531, 837, 80
firstROI = blur[y:y+h,x:x+w]
firstname = pytesseract.image_to_string(firstROI, lang='eng', config='--psm 6')
print(firstname)
firstname = re.sub(r'[^\w]', '', firstname)

cv2.imshow('image', firstROI)
cv2.waitKey()
cv2.destroyAllWindows()

Using the above code, I can able to get text the normal printed text in the white background but unable to get the text from the grey background boxes. For example, first name box real value is "Andrew" but I m getting as "oe" only.

firstROI shows like this:enter image description here

As per Freddy's comments, I go through this link and updated the following code but still no output.

from tesserocr import PyTessBaseAPI, PSM, OEM
api = PyTessBaseAPI(psm=PSM.AUTO_OSD, lang='eng', path=r'C:\Users\Pranav\tessdata-master')
images = ['andrew1.png', 'andrew2.png', 'test1.png']

for img in images:
    api.SetImageFile(img)
    print (api.GetUTF8Text())
    print (api.AllWordConfidences())

these are the sample images andrew1 enter image description here enter image description here

It can read the text output from the third image only(Demographics). Please help me how to read the text from gray background images(Andrew).

Community
  • 1
  • 1
Arun
  • 728
  • 4
  • 16
  • 30
  • Setting an appropriate Page segmentation mode will help detect the characters. https://stackoverflow.com/questions/32584628/tesseract-ocr-returns-null-string – Freddy Daniel Apr 15 '21 at 17:15
  • @FreddyDaniel still its not reading the text, I edited the question again please read it – Arun Apr 17 '21 at 15:02

1 Answers1

0

This link provides me the answer. Its removing the noise in the background image.

Arun
  • 728
  • 4
  • 16
  • 30