Image Processing for recognizing 2D features

Question

I've created an iPhone app that can scan an image of a page of graph paper and can then tell me which squares have been blacked out and which squares are blank.

I do this by scanning from left to right and use the graph paper's lines as guides. When I encounter a graph paper line, I start to look for black, until I hit the graph paper line again. Then, instead of continuing along the scan line, I go ahead and completely scan the square for black. Then I continue on to the next box. At the end of the line, I skip down so many pixels before starting the scan on a new line (since I have already figured out how tall each box is).

This sort of works, but there are problems. Sometimes I mistake the graph lines as "black". Sometimes, if the image is skewed, or I don't have uniform lighting across the page, then I don't get good results.

What I'd like to do is to specify a few "alignment" boxes that I then resize and rotate (and skew) the picture to align with those. Then, I was thinking that once I have the image aligned, I would then know where all the boxes are and won't have to scan for the boxes, just scan inside the location of the boxes to see if they are black. This should be faster and more reliable. And if I were to operate on images coming from the camera, I'd have more flexibility in asking the user to align the picture to match the alignment marks, rather than having to align the image myself.

Given that this is my first Image Processing project, I feel like I am reinventing the wheel. I'd like suggestions on how to do this, and whether to utilize libraries like OpenCV.

I am enclosing an image similar to what I would like processed. I am looking for a list of all squares that have a significant amount of black marking, i.e. A8, C4, E7, G4, H1, J9. enter image description here

Issues to be aware of:

Light coverage of the image may not be ideal, but should be relatively consistent across the image (i.e. no shadows)
All squares may be empty or all dark, and the algorithm needs to be able to determine that
the image may be skewed or rotated about any of the axis. Rotation about the z axis maybe easy to fix. There may be rotation around the x or y axis making ones side of the image be wider than the other. However, if I scan the image in realtime as it comes from the camera, I can ask the user to align the alignment marks with marks on the screen. How best to ensure that alignment to give the user appropriate feedback? Just checking to make sure that the 4 corners are dark could result in a false positive when the camera is pointing to a black surface.
not every square will be equally or consistently blacked, but I think there will be enough black to make it unquestionable to a human eye.
the blue grid may be useful, but there are cases where the black markings may overlap the blue grid. I think a virtual grid is probably better than relying on the printed grid. I would think that using the alignment markers to align the image, would then allow for a precise virtual grid to be laid out. And then the contents of each grid box could be sampled, to see if it was predominantly black, vs scanning from left-to-right, no? Here is another image with more markings on the grid. In this image, in addition to the previous marking in A8, C4, E7, G4, H1, J9, I have marked E2, G8 and G9, and I4 and J4 and you can see how the blue grid is obscured.

2nd image

This is my first phase of this project. Eventually I'd like to scale this algorithm to be able to process at least a few hundred slots and possibly different colors.

In the first iteration, yes. However, down the road, to make the mechanism robust, it probably should create its own virtual boundaries instead of relying on the blue grid. I am posting another picture to show how markings may overlap the blue grid and could cause problems. — mahboudz, Mar 20 '12 at 22:14

score 6 · Accepted Answer · answered Mar 22 '12 at 14:15

6

To start with, this problem reminded me a bit of these demo's that might be useful to learn from:

The DNA microarray image processing
The Matlab Sudoku solver
The Iphone Sudoku solver blog post, explaining the image processing

Personally, I think the most simple approach would be to detect the squares in your image.

1) Remove the background and small cruft

f_makebw = @(I) im2bw(I.data, double(median(I.data(:)))/1.3);
bw = ~blockproc(im, [128 128], f_makebw);
bw = bwareaopen(bw, 30);

enter image description here

2) Remove everything but the squares and circles.

se = strel('disk', 5);
bw = imerode(bw, se);

% Detect the squares and cricles via morphology
[B, L] = bwboundaries(bw, 'noholes');

3) Detect the squares using 'extend' from regionprops. The 'Extent' metric measures what proportion of the bounding-box is filled. This makes it a nice measure to distinguish between circles and squares

stats = regionprops(L, 'Extent'); 
extent = [stats.Extent];
idx1 = find(extent > 0.8);
bw = ismember(L, idx1);

enter image description here

4) This leaves you with your features, to synchronize or rectify the image with. An easy, and robust way, to do this, is via the Autocorrelation Function.

enter image description here

This gives nice peaks, which are easily detected. These peaks can be matched against the ACF peaks from a template image via the Hungarian algorithm. Once matched, you can correct rotation and scaling as you now have a linear system which you can solve:

x = Ax'

Translation can then be corrected using run-of-the-mill cross correlation against the same pre defined template.

If all goes well, you know have an aligned or synchronized image, which should help considerably in determining the position of the dots.

answered Mar 22 '12 at 14:15

Maurits

2,082
3
28
32

Thank you. This gives me much to think about. Your examples are MatLab code, correct? The iPhone Soduku link was very helpful. I'll also be looking into OpenCV. – mahboudz Mar 22 '12 at 21:28
It is Matlab, I can share a function that does everything if you want. With iPhone development I can't help you... – Maurits Mar 22 '12 at 21:51
I'd love to see some C code, if available. But getting general ideas, like you've given me has been helpful too. I can easily do black&white, , and an inverse. I am not sure what bwareaopen, strele and imerode do, but I suppose I can look them up easily. I'm also curious as to how much of this is done by OpenCV. And then, I have this thought that once I have everything correlated, I can easily just look at the locations where I can expect to find circles, and detect them - since they'll be either white or black, and won't have to be contrasted to surroundings to tell. – mahboudz Mar 24 '12 at 02:09
Also, would you say that MatLab would be a good tool to have to test all the possibilities before deciding to write my own code to do something similar? – mahboudz Mar 24 '12 at 02:10
All can be done by opencv, I believe. In general I think for image processing Matlab is the most easy tool and environment to prototype (and learn) in. Scripting it is easy, and its toolboxes and documentation form a good comprehensive package. The free option would be python + numpy/scipy and opencv bindings. – Maurits Mar 24 '12 at 11:14
Thanks for the pointer to the microarray analysis article. I used to do more microarray work at my day job, and now that I'm getting into iOS image processing myself it's fascinating to see the overlap there. – Brad Larson Mar 26 '12 at 19:22
I'm looking to buy MatLab. Any idea if I'll need the Computer Vision toolbox for doing the above? – mahboudz May 16 '12 at 23:37
@mahboudz, no only the image processing toolbox. – Maurits May 17 '12 at 07:35

score 4 · Answer 2 · answered Mar 26 '12 at 22:19

I've been starting to do something similar using my GPUImage iOS framework, so that might be an alternative to doing all of this in OpenCV or something else. As it's name indicates, GPUImage is entirely GPU-based, so it can have some tremendous performance benefits over CPU-bound processing (up to 180X faster for doing things like processing live video).

As a first stage, I took your images and ran them through a simple luminance thresholding filter with a threshold of 0.5 and arrived at the following for your two images:

Threshold image 1 Threshold image 2

I just added an adaptive thresholding filter, which attempts to correct for local illumination variances, and works really well for picking out text. However, in your images it uses too small of an averaging radius to handle your blobs well:

Adaptive threshold image 1 Adaptive threshold image 2

and seems to bring out your grid lines, which it sounds like you wish to ignore.

Maurits provides a more comprehensive description of what you could do, but there might be a way to implement these processing operations as high-performance GPU-based filters instead of relying on slower OpenCV versions of the same calculations. If you could grab rotation and scaling information from this thresholded image, you could construct a transform that could also be applied as a filter to your thresholded image to produce your final aligned image, which could then be downsampled and read out by your application to determine which grid locations were filled in.

These GPU-based thresholding operations run in less than 2 ms for 640x480 frames on an iPhone 4, so it might be possible to chain filters together to analyze incoming video frames as fast as the device's video camera can provide them.

I see your link now to GPUImage. Will investigate. – mahboudz Mar 27 '12 at 10:34 — mahboudz, Mar 27 '12 at 10:34

Image Processing for recognizing 2D features

2 Answers2

Linked