Read text below barcode pytesseract python

Question

I am trying to get the number below a barcode in an image. I have tried the same code with some other images and works fine but not for that image Here's the image

And here is the code till now

def readNumber():
    image = cv2.imread(sTemp)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray, (3,3), 0)
    thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
    opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
    invert = 255 - opening
    data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6 -c tessedit_char_whitelist=0123456789')
    print(data)
    try:
        data  = re.findall('(\d{9})\D', data)[0]
    except:
        data = ''
    return data

And I used it using this line

readNumber()

Here's another example

This is the last example I promise

I tried this with the third example and it works

img = cv2.imread("thisimage.png")
blur = cv2.GaussianBlur(img, (3,3), 0)
#gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
txt = pytesseract.image_to_string(blur)
print(txt)

But how I adopt all the cases to work with the three cases? I tried such a code but couldn't implement the thrid case

import pytesseract, cv2, re

def readNumber(img):
    img = cv2.imread(img)
    gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
    try:
        txt = pytesseract.image_to_string(gry)
        #txt  = re.findall('(\d{9})\D', txt)[0]
    except:
        thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 51, 4)
        txt = pytesseract.image_to_string(thr, config="digits")
        #txt  = re.findall('(\d{9})\D', txt)[0]

    return txt

# M5Pr5         191876320
# RWgrP         202131290
# 6pVH4         193832560
print(readNumber('M5Pr5.png'))

Ahmet · Accepted Answer · 2021-02-11T12:44:09.473

1

You don't need any preprocessing methods or configuration for the input image. Since there is no artifacts in the image.

import cv2
import pytesseract

img = cv2.imread("RWgrP.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
txt = pytesseract.image_to_string(gry)
print(txt)

Result:

202131290

My pytesseract version is 4.1.1

Update-1

The second image requires preprocessing

If you apply adaptive-thresholding:

But the output also consists of unwanted characters. Therefore if you set the configuration to digits, the result will be:

193832560

Update-2

For the third image, you need to change the adaptive method, using ADAPTIVE_THRESH_MEAN_C will result in:

191876320

The rest are same.

Code:

import cv2
import pytesseract

img = cv2.imread("6pVH4.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 51, 4)
txt = pytesseract.image_to_string(thr, config="digits")
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)

edited Feb 11 '21 at 12:44

answered Feb 10 '21 at 20:53

Ahmet

7,527
3
23
47

Thank you very much. I tried the code but I got `202131280` instead of `202131290`. How can this be improved? – YasserKhalil Feb 10 '21 at 21:36
When I edited this line `txt = pytesseract.image_to_string(gry, lang='eng', config='--psm 6 --oem -c tessedit_char_whitelist=0123456789')`, I got correct result. Thank you very much. – YasserKhalil Feb 10 '21 at 21:39
Here is my [result](https://imgur.com/a/EqQ4u9h). Maybe updating or downgrading to the version 4.1.1 makes your job easier. Anyway I'm glad you solve the problem. – Ahmet Feb 10 '21 at 21:40
I have used this line to upgrade `pip install pytesseract==4.1.1` but when using this line to check the version `pip freeze | findstr pytesseract`, I found it `pytesseract==0.3.6` – YasserKhalil Feb 10 '21 at 21:48
1

The latest version is [0.3.7](https://pypi.org/project/pytesseract/). 4.1.1 is the output of `print(pytesseract.get_tesseract_version())` – Ahmet Feb 10 '21 at 21:50
I got this `4.0.0-alpha.20180109` but how I upgrade to 4.1.1 to test your code. – YasserKhalil Feb 10 '21 at 21:51
You know `alpha` is not a stable-version. I guess `pip install pytesseract` will upgrade to the 0.3.7 or (4.1.1) – Ahmet Feb 10 '21 at 21:53
Thanks a lot. Can you please have a look at this related question https://stackoverflow.com/questions/66146007/upgrade-pytesseract-in-python – YasserKhalil Feb 10 '21 at 22:24
I have put another example that didn't work for me. Can you please have a look? How can I be able to widen the width of the image (I think the numbers should be separated by some spaces in between)? – YasserKhalil Feb 11 '21 at 07:08
I've updated my answer, see `Update-1` for the answer – Ahmet Feb 11 '21 at 07:54
Amazing. Thank you very much. Best and Kind Regards. – YasserKhalil Feb 11 '21 at 08:01
Sorry for disturbing you. When I tried the first image with the second code (The update), I got empty string. Is there a way to handle both cases in one code? – YasserKhalil Feb 11 '21 at 08:08
I could implement both cases in the code. Thank you very much. – YasserKhalil Feb 11 '21 at 08:18
I have added a third example and I promise this to be the last one. I tried both codes with it but didn't return the result correctly. – YasserKhalil Feb 11 '21 at 08:31
Can you please have a look at the code in this link https://stackoverflow.com/questions/66151483/ ? Please adivse me if there is a better approach. – YasserKhalil Feb 11 '21 at 12:43
1

No problem for me, ask as much as you can, you need to change the `adaptiveMethod` for the third image. See the answer under `Update-2` – Ahmet Feb 11 '21 at 12:45
1

The issue is you need a deep-learning-based method. The method which is invariant to all changes in the input image. Maybe you should google `east-text-detector` – Ahmet Feb 11 '21 at 12:46
Thank you very very much for the great support in that issue. Thanks a lot. I wish I could increase the rep points again and again and again ..!! – YasserKhalil Feb 11 '21 at 14:46
Can you please have a look at this question https://stackoverflow.com/questions/66183322/get-numbers-from-cropped-image-pytesseract? – YasserKhalil Feb 13 '21 at 08:49

Read text below barcode pytesseract python

1 Answers1

Linked