2

I am trying to recognize seven segment digital text from image using tess4J .

My input is here

enter image description here

I have made some normalization as follows

1 ] Image cropped .

enter image description here

2 ] Converted it into binary

enter image description here

I wish to remove the jagged edges of text from image .How can i accomplish that ?

I have tried different traineddata from GitHub. But nothing works as i wish .

How to create traineddata manually ? .

I am waiting for your great suggestions & lot of thanks in advance. . . .

Don Chakkappan
  • 7,397
  • 5
  • 44
  • 59

1 Answers1

4

You can try a combination of Sobel filters (to thin the edges) and Gaussian filters (to blur the image).

You didn't specify which API you are using for image manipulation in Java, and as I'm not familiar with Tess4J I will show what can be accomplished from Python (you can use your preferred library for image manipulation in Java, the process will be the same):

import scipy
import scipy.misc
import scipy.ndimage.filters
import numpy

def save_image(img_data, counter):
    img_fn = "img_{}.jpg".format(counter)
    scipy.misc.imsave(img_fn, img_data)


if __name__ == "__main__":
    # This loads the second image of your post
    img_0 = scipy.misc.imread("TqO53.jpg")
    img_0 = scipy.average(img_0, -1) 
    #save_image(img_0, 0)

    # Obtain edges
    img_x = scipy.ndimage.filters.sobel(img_0, 0)
    img_y = scipy.ndimage.filters.sobel(img_0, 1)
    img_1 = numpy.hypot(img_x, img_y)
    #save_image(img_1, 1)

    # Remove edges from original image (i.e. thinning edges)
    img_2 = img_0 - img_1
    img_2[img_2 < 10] = 0 
    save_image(img_2, 2)

    # Blur image if you want to get rid of the sketchy borders
    img_3 = scipy.ndimage.gaussian_filter(img_2, sigma=1)
    save_image(img_3, 3)

This will generate the following images:

img_2.jpg

With edges thined

img_3.jpg

Blurred

You can try with both types of images to determine which gives good results with Tess4J, it is possible that you don't need to blur the image after thinning the edges, as the numbers can be recognized more easily.

If after that you want, you can try thinning the whole numbers until they are 1 pixel thick. Maybe that works good with Tess4J.

Daniel
  • 21,933
  • 14
  • 72
  • 101
  • One thing about using Gaussian filtering is that you make small features like the decimal point harder to detect. – Andy Turner Mar 15 '15 at 23:22
  • It's true, when writing it I was thinking in how to increase the separation between the point and the 4 before applying the gaussian, but I couldn't think of a good way that didn't involve thresholding. Maybe a more sensible border detection? – Daniel Mar 16 '15 at 02:37