1

I have searched but i can't find an algorithm to do despeckling in a scanned document. I have already used Median filter to remove some smaller speckles.

I need an algorithm that removes large speckles from a document, I have tried using Connected Component Labeling (using Aforge) but there is no option to color an object white (remove it) if it is larger than X pixels.

Is there any way to delete objects on my picture that are larger than X pixels (specks, blobs, noise)?

Edit:

Here is the document which I am trying to prepare for OCR. Original Document

As we can se there is a lot of noise on the left of the text. So then I use Blur to blur the noise on the left, and then I Binarized the document and got something like this After Binarization

Now I need to remove the large black area from the left. I just dont know how to do that...

  • Have you tried using tesseract? – ernest Nov 10 '15 at 16:24
  • 2
    Could you please post a picture or a link to a picture? For an image processing question, assume you should always post a sample image. Even if you see many different kinds of speckle/noise in your document, having a few sample images will help those trying to give you advice. There are quite a few ways to eliminate blobs/speckle, but different techniques will be chosen depending on the types of noise you typically see. – Rethunk Nov 11 '15 at 03:00
  • Thank you for answering. I have edited the question and added some more info and images. I have not tried using tesseract, I am trying to implement all the algorithms myself... – Nemanja Zivkovic Nov 11 '15 at 20:00
  • Is cropping the image an option? i.e. find the area of black and crop to the inside edge of it – TheLethalCoder Nov 24 '15 at 16:28

1 Answers1

0

Just a try in Matlab.

img = imread('xWFEC.png');
img = imcomplement(imclearborder(imcomplement(img)));
figure; imshow(img);

Output:

enter image description here

Huá dé ní 華得尼
  • 1,248
  • 1
  • 18
  • 33
  • 1
    Want to add an explanation of how this works, especially as this is a `C#` question and as far as I know those functions, or similar, are not available in standard `C#` and so would need to be implemented by the user themselves. – TheLethalCoder Nov 24 '15 at 16:20
  • @TheLethalCoder, Thanks again. As far as I know we can use 'EmguCV' (http://www.emgu.com/wiki/index.php/Main_Page) with C#. This SO answer (http://stackoverflow.com/questions/24731810/segmenting-license-plate-characters) clearly explains how to implement Matlab 'imclearborder' with OpenCV. – Huá dé ní 華得尼 Nov 25 '15 at 01:05