-1

enter image description here

Say i have this old manuscript ..What am trying to do is making the manuscript such that all the characters present in it can be perfectly recognized what are the things i should keep in mind ? While approaching such a problem any methods for the same?

Please help thank you

Asp.net my life
  • 386
  • 4
  • 11

2 Answers2

3

Some graphics applications have macro recorders (e.g. Paint Shop Pro). They can record a sequence of operations applied to an image and store them as macro script. You can then run the macro in a batch process, in order to process all the images contained in a folder automatically. This might be a better option, than re-inventing the wheel.

I would start by playing around with the different functions manually, in order to see what they do to your image. There are an awful number of things you can try: Sharpening, smoothing and remove noise with a lot of different methods and options. You can work on the contrast in many different ways (stretch, gamma correction, expand, and so on).

In addition, if your image has a yellowish background, then working on the red or green channel alone would probably lead to better results, because then the blue channel has a bad contrast.

Olivier Jacot-Descombes
  • 104,806
  • 13
  • 138
  • 188
2

Do you mean that you want to make it easier for people to read the characters, or are you trying to improve image quality so that optical character recognition (OCR) software can read them?

I'd recommend that you select a specific goal for readability. For example, you might want readers to be able to read the text 20% faster if the image has been processed. If you're using OCR software to read the text, set a read rate you'd like to achieve. Having a concrete goal makes it easier to keep track of your progress.

The image processing book Digital Image Processing by Gonzalez and Woods (3rd edition) has a nice example showing how to convert an image like this to a black-on-white representation. Once you have black text on a white background, you can perform a few additional image processing steps to "clean up" the image and make it a little more readable.

Sample steps:

  1. Convert the image to black and white (grayscale)
  2. Apply a moving average threshold to the image. If the characters are usually about the same size in an image, then you shouldn't have much trouble selecting values for the two parameters of the moving average threshold algorithm.
  3. Once the image has been converted to just black characters on a white background, try simple operations such as morphological "close" to fill in small gaps.
  4. Present the original image and the cleaned image to adult readers, and time how long it takes for them to read each sample. This will give you some indication of the improvement in image quality.

A technique call Stroke Width Transform has been discussed on SO previously. It can be used to extract character strokes from even very complex backgrounds. The SWT would be harder to implement, but could work for quite a wide variety of images: Stroke Width Transform (SWT) implementation (Java, C#...)

The texture in the paper could present a problem for many algorithms. However, there are technique for denoising images based on the Fast Fourier Transform (FFT), an algorithm that you can use to find 1D or 2D sinusoidal patterns in an image (e.g. grid patterns). About halfway down the following page you can see examples of FFT-based techniques for removing periodic noise: http://www.fmwconcepts.com/misc_tests/FFT_tests/index.html

If you find a technique that works for the images you're testing, I'm sure a number of people would be interested to see the unprocessed and processed images.

Community
  • 1
  • 1
Rethunk
  • 3,976
  • 18
  • 32