0

This is the image input.

Using python opencv. I did some pre-processing and found contours using

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

then i did the following to save each character

img1 = cv2.imread("test26.png")
nu = 1
fin = "final"
for cnt in contours:
    x,y,w,h = cv2.boundingRect(cnt)
    img2 = img1[y:y+h, x:x+w]
    img3 = Image.fromarray(img2)
    filename = fin + str(nu) + ".png"
    nu = nu + 1
    img3.save(filename)

But characters are saved in a tree like order. I don't understand the order.

my intention is to get character by character and ocr it in order and save as text.

Ishara Madhawa
  • 3,549
  • 5
  • 24
  • 42
Aswani KV
  • 1
  • 2

2 Answers2

2

You can try to find the location of letter by using the center of contours.

M = cv2.moments(contours)
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])

Then you can find the order of characters with using cX and cY (If only one line, you use only cX)

Ibrahim
  • 320
  • 2
  • 7
  • Idea is good. But i am saving the letters in using name final1,final2,... – Aswani KV Mar 25 '17 at 11:00
  • As the lettes are being detected randomly how can I compare and find correct position – Aswani KV Mar 25 '17 at 11:01
  • What do you think about using any OCR tool? You can use Tesseract OCR with Python. To Set up Tesseract via homebrew (I tried on MAC OS) 1-install homebrew : ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" 2-install tesseract with homebrew : brew install tesseract 3- Python wrapper for tesseract-OCR: brew install tesseract – Ibrahim Mar 27 '17 at 06:14
0

This code sorts the bounding boxes and achieves what was probably intended, does it?

import cv2
strFormula="1!((x+1)*(x+2))" # '!' means a character is not allowed in file name
img = cv2.imread("test26.png")
imgGray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
ret, imgThresh = cv2.threshold(imgGray, 127, 255, 0)

(major_ver, minor_ver, subminor_ver) = (cv2.__version__).split('.')
if int(major_ver)  < 3 :
    contours , hierarchy  = cv2.findContours(imgThresh, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
else :
    image, contours , _   = cv2.findContours(imgThresh, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
#:if

lstBoundingBoxes = []
for cnt in contours:  lstBoundingBoxes.append(cv2.boundingRect(cnt))
lstBoundingBoxes.sort()

charNo=0
for item in lstBoundingBoxes[1:]: # skip first element ('bounding box' == entire image)
    charNo += 1
    fName = "charAtPosNo-" + str(charNo).zfill(2) + "_is_[ " + strFormula[charNo-1] + " ]"+ ".png"; 
    x,y,w,h = item
    cv2.imwrite(fName, img[y:y+h, x:x+w])
Claudio
  • 7,474
  • 3
  • 18
  • 48