I'm writing a system to dropout certain field borders from a form image. The fields may have writing in them which I need to correctly keep even if the handwriting crosses the field border.
I have 2 images: 1 color image (converted to HSV colorspace) and 1 black/white image that line up pixel per pixel (these are produced by a scanner)
I would like to remove (pluck) the field border pixels from the black and white image, given the colors in the color image.
I have an advantage in that I know apriori the exact location of the field, and the widths/heights of the field border lines.
My current implementation consists of (for each field), scanning the field border on the color image and calculating an average HSV value for that field border (since I know exactly where the field border is, I only visit "field border" pixels, but I may also visit a few handwriting pixels if they cross the field border, the idea is that they won't skew the average very much). Once I have an "average" HSV value for the field border, I scan the field border again, and for each pixel compute the following delta function:
If the Delta value between the "current" pixel and the average HSV is less than 0.07 (found empirically) then I set the pixel to white (colors are close together), otherwise I keep the pixel as black.
Here are some examples of a field:
Color Image:
Black&White Image Non-Dropped Out:
Dropped out Black&White Image where Saturation is not used in Equation:
Actual Dropped out Black & White Image with formula used in full (using all 3 components H,S & V)
The formula I'm using to get the 3rd dropped out image is the above formula but
where I left the Saturation out of the equation (I was just playing around with things).
This this obviously not delicate enough to color variations but the formula is very
sensitive to saturation changes (this is mainly caused by JPEG compression artifacts
that exist within the image (example artifacts):
I think the 4th example is the best because it's really sensitive to color variations so you're less likely to remove handwriting, but the problem is you're more prone to pick up border because of slight color differences caused by simple scanning or compression artifacts.
What are your thoughts to alleviate some of the color (saturation) variations that occur within the field border, is it to use histograms? with some quantization involved there to reduce number of bins?
I'd like to hear any ideas people have.
Thank you.