1

I am trying to get the number below a barcode in an image. I have tried the same code with some other images and works fine but not for that image Here's the image enter image description here

And here is the code till now

def readNumber():
    image = cv2.imread(sTemp)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray, (3,3), 0)
    thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
    opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
    invert = 255 - opening
    data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6 -c tessedit_char_whitelist=0123456789')
    print(data)
    try:
        data  = re.findall('(\d{9})\D', data)[0]
    except:
        data = ''
    return data

And I used it using this line

readNumber()

Here's another example enter image description here

This is the last example I promise enter image description here

I tried this with the third example and it works

img = cv2.imread("thisimage.png")
blur = cv2.GaussianBlur(img, (3,3), 0)
#gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
txt = pytesseract.image_to_string(blur)
print(txt)

But how I adopt all the cases to work with the three cases? I tried such a code but couldn't implement the thrid case

import pytesseract, cv2, re

def readNumber(img):
    img = cv2.imread(img)
    gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
    try:
        txt = pytesseract.image_to_string(gry)
        #txt  = re.findall('(\d{9})\D', txt)[0]
    except:
        thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 51, 4)
        txt = pytesseract.image_to_string(thr, config="digits")
        #txt  = re.findall('(\d{9})\D', txt)[0]

    return txt

# M5Pr5         191876320
# RWgrP         202131290
# 6pVH4         193832560
print(readNumber('M5Pr5.png'))
YasserKhalil
  • 9,138
  • 7
  • 36
  • 95

1 Answers1

1

You don't need any preprocessing methods or configuration for the input image. Since there is no artifacts in the image.

import cv2
import pytesseract

img = cv2.imread("RWgrP.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
txt = pytesseract.image_to_string(gry)
print(txt)

Result:

202131290

My pytesseract version is 4.1.1

Update-1


The second image requires preprocessing

If you apply adaptive-thresholding:

enter image description here

But the output also consists of unwanted characters. Therefore if you set the configuration to digits, the result will be:

193832560

Update-2


For the third image, you need to change the adaptive method, using ADAPTIVE_THRESH_MEAN_C will result in:

191876320

The rest are same.

Code:

import cv2
import pytesseract

img = cv2.imread("6pVH4.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 51, 4)
txt = pytesseract.image_to_string(thr, config="digits")
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)
Ahmet
  • 7,527
  • 3
  • 23
  • 47
  • Thank you very much. I tried the code but I got `202131280` instead of `202131290`. How can this be improved? – YasserKhalil Feb 10 '21 at 21:36
  • When I edited this line `txt = pytesseract.image_to_string(gry, lang='eng', config='--psm 6 --oem -c tessedit_char_whitelist=0123456789')`, I got correct result. Thank you very much. – YasserKhalil Feb 10 '21 at 21:39
  • Here is my [result](https://imgur.com/a/EqQ4u9h). Maybe updating or downgrading to the version 4.1.1 makes your job easier. Anyway I'm glad you solve the problem. – Ahmet Feb 10 '21 at 21:40
  • I have used this line to upgrade `pip install pytesseract==4.1.1` but when using this line to check the version `pip freeze | findstr pytesseract`, I found it `pytesseract==0.3.6` – YasserKhalil Feb 10 '21 at 21:48
  • 1
    The latest version is [0.3.7](https://pypi.org/project/pytesseract/). 4.1.1 is the output of `print(pytesseract.get_tesseract_version())` – Ahmet Feb 10 '21 at 21:50
  • I got this `4.0.0-alpha.20180109` but how I upgrade to 4.1.1 to test your code. – YasserKhalil Feb 10 '21 at 21:51
  • You know `alpha` is not a stable-version. I guess `pip install pytesseract` will upgrade to the 0.3.7 or (4.1.1) – Ahmet Feb 10 '21 at 21:53
  • Thanks a lot. Can you please have a look at this related question https://stackoverflow.com/questions/66146007/upgrade-pytesseract-in-python – YasserKhalil Feb 10 '21 at 22:24
  • I have put another example that didn't work for me. Can you please have a look? How can I be able to widen the width of the image (I think the numbers should be separated by some spaces in between)? – YasserKhalil Feb 11 '21 at 07:08
  • I've updated my answer, see `Update-1` for the answer – Ahmet Feb 11 '21 at 07:54
  • Amazing. Thank you very much. Best and Kind Regards. – YasserKhalil Feb 11 '21 at 08:01
  • Sorry for disturbing you. When I tried the first image with the second code (The update), I got empty string. Is there a way to handle both cases in one code? – YasserKhalil Feb 11 '21 at 08:08
  • I could implement both cases in the code. Thank you very much. – YasserKhalil Feb 11 '21 at 08:18
  • I have added a third example and I promise this to be the last one. I tried both codes with it but didn't return the result correctly. – YasserKhalil Feb 11 '21 at 08:31
  • Can you please have a look at the code in this link https://stackoverflow.com/questions/66151483/ ? Please adivse me if there is a better approach. – YasserKhalil Feb 11 '21 at 12:43
  • 1
    No problem for me, ask as much as you can, you need to change the `adaptiveMethod` for the third image. See the answer under `Update-2` – Ahmet Feb 11 '21 at 12:45
  • 1
    The issue is you need a deep-learning-based method. The method which is invariant to all changes in the input image. Maybe you should google `east-text-detector` – Ahmet Feb 11 '21 at 12:46
  • Thank you very very much for the great support in that issue. Thanks a lot. I wish I could increase the rep points again and again and again ..!! – YasserKhalil Feb 11 '21 at 14:46
  • Can you please have a look at this question https://stackoverflow.com/questions/66183322/get-numbers-from-cropped-image-pytesseract? – YasserKhalil Feb 13 '21 at 08:49