I know this could be done by applying a threshold and some other techniques using OpenCV but in that case different threshold value might be required for different images.
I tried to apply Encoder-Decoder Neural Network. But what I noticed that I was able to get that for images in which the size of characters were large i.e. each image had around 4 characters but as I increased the numbers or equivalently reduced the size of each character in an image the result was very bad.
I need a very accurate output so that further each character could be fed to the recognition model.
Gray Scale as input image to the Neural Network
Target/Ground Truth(output image)
def encoder_decoder(input_shape):
return Sequential([
Lambda(lambda x: x / 127.5 - 1.0, input_shape=input_shape),
Convolution2D(8, 3, 3, activation='relu', border_mode='same'),
Convolution2D(8, 3, 3, activation='relu', border_mode='same'),
MaxPooling2D((2,2), strides=(2,2)),
Dropout(0.2),
Convolution2D(16, 3, 3, activation='relu', border_mode='same'),
Convolution2D(16, 3, 3, activation='relu', border_mode='same'),
MaxPooling2D((2,2), strides=(2,2)),
Dropout(0.2),
Convolution2D(32, 3, 3, activation='relu', border_mode='same'),
Convolution2D(32, 3, 3, activation='relu', border_mode='same'),
MaxPooling2D((2,2), strides=(2,2)),
Dropout(0.2),
Convolution2D(64, 3, 3, activation='relu', border_mode='same'),
Convolution2D(64, 3, 3, activation='relu', border_mode='same'),
MaxPooling2D((2,2), strides=(2,2)),
Dropout(0.2),
Convolution2D(128, 3, 3, activation='relu', border_mode='same'),
Convolution2D(128, 3, 3, activation='relu', border_mode='same'),
UpSampling2D(size=(2,2)),
Dropout(0.2),
Convolution2D(64, 3, 3, activation='relu', border_mode='same'),
Convolution2D(64, 3, 3, activation='relu', border_mode='same'),
UpSampling2D(size=(2,2)),
Dropout(0.2),
Convolution2D(32, 3, 3, activation='relu', border_mode='same'),
Convolution2D(32, 3, 3, activation='relu', border_mode='same'),
UpSampling2D(size=(2,2)),
Dropout(0.2),
Convolution2D(16, 3, 3, activation='relu', border_mode='same'),
Convolution2D(16, 3, 3, activation='relu', border_mode='same'),
UpSampling2D(size=(2,2)),
Dropout(0.2),
Convolution2D(8, 3, 3, activation='relu', border_mode='same'),
Convolution2D(8, 3, 3, activation='relu', border_mode='same'),
Convolution2D(1, 1, 1, activation='relu')
])
The background contains the lines as well(to be removed as well) which are not exactly the same in different images. Also they aren't exactly parallel to apply FFT. The handwriting is obtained from different person.
Trained the Neural Network on small a set of images(70 to 100) to test until it overfits. The input images are first converted to grey scale, i.e. the whole process takes place in grey scale. Size of an image is (800X600) as further I need to draw contours and apply classification algorithm I cannot obtain smaller output.
Earlier I trained(ignore the background here being '0' and in next example '255' for output) on clear images containing less number of characters in an image:
Later, on this kind of image:
Is it possible to reconstruct such small size characters/digits?
How should I improve its performance? Should there be an improvement in the architecture, the size of data set,resolution or something else.
And if there are some other better approaches I would be glad to know.