the best ways of recognising drawn images

Question

today I'm seeking an advice about the best ways of recognising drawn images. For example if you use Chinese/Japanese keyboard, you can draw that special signs with your finger and it will be recognised and the right symbol will be placed in the text area. How can I do something like this? I was thinking about using cocos2d, and can Core Image help?

I'd like to also ask another question: Let's say you've got screen loaded with a fog texture, and you have to swipe your finger to clean the window. How can it be done? As a good example, the game Where's my Water use something like this, where you have to swipe your finger to remove some ground and make space for water.

I hope it sounds clear, and I would be grateful for any answer :)

OCR is not exactly the easiest thing to do. http://en.wikipedia.org/wiki/Optical_character_recognition — CodeSmile, Nov 01 '12 at 23:19
I don't want to read characters from images, I just want to compare if image A is the same or very similar to image B. Is it possible to do it without OCR? or is it part of it? — Maciej Chrzastek, Nov 01 '12 at 23:54

score 2 · Accepted Answer · edited May 23 '17 at 11:48

There a a few SO posts about image recognition floating around. This one is probably the closest match to what you want and Tom Gullen's answer is very comprehensive. You may also like to look at redmoskito's answer to this question.

One fairly basic method that I have seen that is not mentioned in either of these posts is the following:

(I can't take credit for this - if someone else can find the SO post this is from, please let me know!)

Shrink the image you wish to compare to a small size (e.g. 4x4 px)
Shrink the user's hand-drawn image to the same size
Iterate over the pixels and compare their (for example, RGB) data to the shrunk reference image pixels.
((insert comparison threshold here)) if the individual pixels are 'similar enough' to the pixels of the original image - you have a match.

This can work fairly well if you have a closed set of comparison images and you know that one of those images will be the one drawn (the idea being that each image will have a unique 4x4px 'fingerprint').

Its overall effectiveness is based on the algorithm you use to determine what defines a "similar pixel" (e.g. similar RGB values, nearest-neighbour similarity, etc) and, naturally the larger your shrunk image is, the more exact the process will be. I have used this general procedure with reasonable success for recognising basic shapes and characters. You'll need a pretty good (and thoroughly tested) logic algorithm to make this production quality though.

As for your second question, I assume you mean how to remove fog that you trace your finger over (like what happens in real life). One way of achieving this would be to detect where the finger is and "draw an alpha channel" which then acts as a mask for your fog image. Or, you could draw directly to the image and set the relevant pixels' alpha values to 0.

These are just some ideas, the area of image comparison and manipulation is huge. But hopefully this will provide a starting point for further exploration.

EDIT:

Apple provide two nice (iOS-compatible) functions for extracting pixel data. If you wrap them in a function:

+ (NSMutableData *)pixelDataFromImage:(UIImage *)image {

    NSMutableData *pixelData = (__bridge_transfer NSMutableData *)
    CGDataProviderCopyData(CGImageGetDataProvider(image.CGImage));
    return pixelData;
    // Return data is of the form [RGBA RGBA RGBA ....]
    //                             ^^^^ ^^^^ ^^^^
    //               Byte Index:   0123 4567 89..
    //                             ^    ^    ^
    //             Pixel Number:   px1  px2  px3
}

So to tie it together into a (minimal) algorithm based on the 4 steps above:

//...
NSMutableData *imagePixelData = [self pixelDataFromImage:image];
NSMutableData *referencePixelData = [self pixelDataFromImage:reference];
// Both image and reference are UIImages

if ([imagePixelData length] != [referencePixelData length]) {
    return 0.0f; // Can't compare, different number of pixels
}

Byte *imagePixelBytes = [imagePixelData mutableBytes];
Byte *referencePixelBytes = [referencePixelData mutableBytes];

int totalDifference = 0;
float averageDifference = 0;
int bytesCompared = 0;

for (int i = 0; i < [imagePixelData length]; i++) {

    if ((i+1) % 4 == 0) { // Compare only alpha values in this example
                          // (compares images ignoring colour)

        int difference = (int)fabs(imagePixelBytes[i] - referencePixelBytes[i]];
        totalDifference += difference;
        bytesCompared += 1;
    }
}

averageDifference = totalDifference/bytesCompared;
float similarity = 1.0f - (averageDifference/255);
return similarity;
// 1.0 => Exact match
// Now you need to determine a threshold for "how similar means 'the same'".

As I said this is only minimal, but it's one way of implementing the procedure outlined above. Certainly the two Core Graphics functions make life much easier, and once you have the data, you just end up comparing two byte-arrays. Note you'll still need to shrink the images first (with Core Graphics) - there are a few tutorials around for that (e.g. here).

Typically text converters also use stroke order and direction information to determine the character, although that is more intensive programming that repoguy is likely able to do. — AJMansfield, Nov 02 '12 at 02:13
@PLPiper thank you for the answer and I really like your idea to shrink images and compare pixels. I know it's not easy but at least I have something to start with. According to the second question, if I'll draw alpha pixels to an image, shape of the image will still be the same like shape of the image before drawing alpha values. So if I'll have the same shape of the image and I'd like to detect if there is a hole to fill with water, I won't be able right? I would have to allow my "water sprite" fill everything that contains alpha pixels, I can't change the image shape. I am thinking right? — Maciej Chrzastek, Nov 02 '12 at 15:15
AJMansfield, so far I'm not able to do anything describe in this post, I'm looking for the place where to start. I think your method is the same like LearnCocos2D, to use OCR. — Maciej Chrzastek, Nov 02 '12 at 15:18
@repoguy I've edited in some sample code showing one way to implement the algorithm. I'm certainly not suggesting you use it for serious production work - but hopefully it's a place to start and see the kinds of things needed for comparing images. — Ephemera, Nov 03 '12 at 01:06

the best ways of recognising drawn images

1 Answers1