19

I am building an App for a parent of a friend of mine that sadly had a stroke and can no longer talk, read or spell. He can however draw rather detailed drawings.

I have currently built an App that can process an image of a drawing and detect basic shapes. (Lines, squares and triangles) The App can count how many of each shape has been drawn so it knows the difference between an image with two squares appose to an image with just one square.

This places a large amount of cognitive load onto the user to remember all combinations of shapes and what they mean. I am currently detecting the contours in the image via

findContours(maskMat, contours, hierarchy, CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE);

What I would like to achieve is the user draws a shape, adds that to a bank of known drawings and then each time he draws an image, the App processes each known image comparing it to the source image and saving a similarity value. Then taking the highest similarity value, providing it is above a threshold, it can be taken as the image drawn is the best known image.

I have looked into OpenCV Pattern Matching as well as Templating but with unreliable results.

I'm asking for advice into the best approach that will provide the result I'm hoping for.

I built a promotion video for my university lecture to best illustrate what the App does. If you are interested you can view it here. https://youtu.be/ngMUUIsLHoc

Thanks in advance.

Ste Prescott
  • 1,789
  • 2
  • 22
  • 43
  • did you try cv function matchShapes? http://docs.opencv.org/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html#double%20matchShapes%28InputArray%20contour1,%20InputArray%20contour2,%20int%20method,%20double%20parameter%29 – Micka Aug 21 '15 at 08:36
  • can you provide a typical sequence of "same shapes" images? – Micka Aug 21 '15 at 08:50
  • @Micka The end goal is to have the App understand the Blissymbolics language. It is a known language taught to people who struggle to communicate by speech. Some examples can be found here. http://www.the-symbols.net/blissymbolics/dictionary/ – Ste Prescott Aug 21 '15 at 09:49

5 Answers5

5

First of all, this looks like a great app. And for a fantastic purpose. Good work!

To the specific of your question, having watched the video, it seems like one approach would be to as follows:

1.Divide each drawing region into (say) a 3x3 grid and allow each region to contain a primitive, say vertical line, horizontal line, square, circle, triangle or nothing at all. (This depends somewhat on the motor control of your friend's parent)

  1. When an image is complete, detect these primitives and encode a (say) 9 character key which can be used to retrieve the appropriate image. For example if triangle is, T, square is S and empty is underscore, then the code for 'I'm going home' as per the video would be "_T__S____".

  2. When a new image is started, you can detect each primitive as it's drawn and use it to construct a search key where the key has '?' for unknown characters. You can then quickly retrieve all possible matches from your database.

For example, if the user draws a triangle in the top, middle region, this would be encoded as '?T???????' and this would match '_T__S____' as well as '_TT______'

If constraining the user to draw into a smaller region of the screen is not feasible then you can still store an encoding key representing the relative positions of each primitive.

To do this you could calculate the centre of mass of each primitive, sort them left to right, top to bottom and then store some representation of their relative positions, e.g. a triangle above a square might be TVS where the V means that S is below T, a triangle to the left of a square might be T

Hope this helps.

Good luck!

Dave Durbin
  • 3,562
  • 23
  • 33
  • 1
    Thanks for the suggestion. This is a cleaver way of using what I already have but it would require the user to draw the shape at the same size every time. So if he drew a large square that hit all 9 segments, it would be a different shape to a smaller square that only hit 4 of the segments. – Ste Prescott May 09 '15 at 20:07
  • Do you have any constraints on whether shape sizes have meaning or shapes can overlap ? It seems as if you could still use a relative position to encode a key, even without segmenting the drawing region i.e. to capture the idea of a triangle above a square vs. alongside a square – Dave Durbin May 11 '15 at 01:43
  • You can normalize your shapes to unit length by fitting a bounding box to the sketch and making its width/height to some unit length. You can fit a bounding box by detecting the left most, right most, top most and bottom most pixels, assuming there are no noisy dots, lines around. If there is some noise, you can get rid of it using some basic image processing such as find connected components and remove the ones which are only a few pixels (noisy dots etc). – rs_ Aug 21 '15 at 22:02
5
  1. Sketch-based image retrieval. There is quite an extensive literature on looking up real images using sketches as the query. Not quite what you want, but some of the same techniques can probably be adapted to look up sketches using a sketch as the query. They may even work without modification.

  2. Automatic recognition of handwritten Chinese characters (or similar writing systems). There is also quite a lot of literature on that; the problem is similar, the writing system has evolved from image sketches but much simplified and stylized. Try to apply some of the same techniques.

  3. The number, order, location of individiual lines is probably more informative than the finished sketch as an image. Is there a way you can capture this? If your user is drawing using a stylus, you can probably record stylus trajectories for each line. This will have much, much more information content than the image itself. Think about someone drawing (eg) a car with their eyes closed. Going by the trajectories, you can easily figure out it's a car. From the picture it may be much harder.

  4. If you can capture lines as described, then the matching problem can be, to some approximation, reduced to the problem of matching some of the lines in image A to the most similar lines in image B (possibly deformed, offset, etc). They should also have similar relationships with other lines: if (for example) two lines cross in image A, they should cross in image B, and at a similar angle and at a similar location along the length of each. To be more robust, this should ideally deal with things like two lines in one image corresponding to a single (merged) line in the other.

Alex I
  • 19,689
  • 9
  • 86
  • 158
5

Description based Approach

Based on your video, I am assuming that you are mostly interested in comparing line drawings and not detailed sketches. For line drawings I could think of the following description based approach. The proposed description is ratio based and would not depend on the absolute sizes/dimensions of the shape and should also handle variations quite well.

Computing a Description for the Shape

You need to compute a description for the normalized shape which would be resilient to small changes between different instances. There is a vast literature in shape matching and sketch based retrieval, as mentioned in previous answer, so I will not repeat that. Assuming that the shapes you are dealing with are either lines or polygonal shapes, the following relatively simple approach should work.

  1. Detect all lines in the image using Hough Transform. A very detailed example is given here. Hand-drawn lines may not be completely straight and hough may not detect them as a single line but multiple different segments with different slopes and intercepts. You can either restrict the sketching to only be in form of lines, or use a line fitting approach that would fit a single straight line for a slightly irregular hand-drawn line.

  2. Sort the lines based on X-coordinates first and then Y-coordinates.

  3. Traverse the lines from left to right, bottom to top order (or any fixed order) and compute the following properties for every line.

  4. Create a feature vector by concatenating the above values, say for example if the sketch had three lines, the feature vector would be of the form { theta1, theta2, theta3, lengthRatio1, lengthRatio2, lengthRatio3, segmentRatio1, segmentRatio2, segmentRatio3 }

Matching Query and Database Shapes

As explained above, you can create the feature vector representations for all shapes in your database as well as any new query. You can now write a simple function that computes the distance between two vectors. A pseudo-code is given below.

int numLines; // computed using hough transform
vector<float> featureVector1(vec_size);
vector<float> featureVector1(vec_size);
...
// feature vectors computed as explained//

// Compute error between two vectors //

float angleError = 0.0f, lengthRatioError = 0.0, segmentRatioError = 0.0;

float diff = 0.0;    
// (0,numLines-1) elements of the vector are angles
for(int i=0; i < numLines; i++) {
    diff = abs(featureVector1[i] - featureVector2[i]);
    angleError += diff;
}

diff = 0.0;
// (numLines,2numLines-1) elements of the vector are length ratios
for(int i=numLines; i < 2*numLines-1; i++) {
    diff = abs(featureVector1[i] - featureVector2[i]);
    lengthRatioError += diff;
}

diff = 0.0;
// (2*numLines,3numLines-1) elements of the vector are segment ratios
// These values should be zero for no intersection
for(int i=numLines; i < 2*numLines-1; i++) {
    diff = abs(featureVector1[i] - featureVector2[i]);
    segmentRatioError += diff;
}

// Weights for errors - you should play around with these. 
float w1 = 1.0, w2 = 1.0, w3 = 1.0;
totalError = w1*angleError + w2*lengthRatioError + w3*segmentRatioError;

Match the query feature with all database features, and find the shape with minimum distance (totalError). If the distance is below a threshold, declare a match, otherwise declare query as a new shape and add to the database.

rs_
  • 433
  • 4
  • 12
  • Thank you @rajvi for your detailed answer. I will look further into this. Are you ok to answer and further questions if they come up? – Ste Prescott Aug 22 '15 at 09:16
  • Yes, definitely. Please let me know if you need any clarifications. – rs_ Aug 22 '15 at 10:51
4

One method that may help, due to how long this library has been around and how well it seems to be maintained is to use ImageMagick. Let me demonstrate.

1 install the Pod using

pod 'ImageMagick', '6.8.8-9'

2 in the viewController or whatever other view you want to do your comparisons between images, import the following:

#import <wand/MagickWand.h>

4 create easy mode exception "Macro" to check for errors without having to code up the exception method every time you need to check for exceptions:

#define ThrowWandException(wand) { \
char * description; \
ExceptionType severity; \
\
description = MagickGetException(wand,&severity); \
(void) fprintf(stderr, "%s %s %lu %s\n", GetMagickModule(), description); \
description = (char *) MagickRelinquishMemory(description); \
exit(-1); \
}

3 create comparison method to compare two images:

-(void)compareTwoImages:(UIImage*)firstImage secondImage:(UIImage*)secondImage comparitorSize:(size_t)comparitorSize {

    double diff1, diff2, diff3, diff4, diff5, diff6, diff7, diff8, diff9, diff10, diff11, diff12;

    MagickWandGenesis();
    MagickWand *magick_wand_1 = NewMagickWand();
    NSData * dataObject1 = UIImagePNGRepresentation(firstImage);
    MagickBooleanType status1;
    status1 = MagickReadImageBlob(magick_wand_1, [dataObject1 bytes], [dataObject1 length]);

    if (status1 == MagickFalse) {
        ThrowWandException(magick_wand_1);
    }

    MagickWandGenesis();
    MagickWand *magick_wand_2 = NewMagickWand();
    NSData * dataObject11 = UIImagePNGRepresentation(secondImage);
    MagickBooleanType status11;
    status11 = MagickReadImageBlob(magick_wand_2, [dataObject11 bytes], [dataObject11 length]);

    if (status11 == MagickFalse) {
        ThrowWandException(magick_wand_2);
    }

    MagickScaleImage(magick_wand_2, comparitorSize, comparitorSize);
    MagickScaleImage(magick_wand_1, comparitorSize, comparitorSize);

    MagickWandGenesis();
    MagickWand *magick_wand_3 = NewMagickWand();

    MagickCompareImages(magick_wand_1, magick_wand_2, UndefinedMetric, &diff1);
    MagickCompareImages(magick_wand_1, magick_wand_2, AbsoluteErrorMetric, &diff2);
    MagickCompareImages(magick_wand_1, magick_wand_2, MeanAbsoluteErrorMetric, &diff3);
    MagickCompareImages(magick_wand_1, magick_wand_2, MeanErrorPerPixelMetric, &diff4);
    MagickCompareImages(magick_wand_1, magick_wand_2, MeanSquaredErrorMetric, &diff5);
    MagickCompareImages(magick_wand_1, magick_wand_2, PeakAbsoluteErrorMetric, &diff6);
    MagickCompareImages(magick_wand_1, magick_wand_2, PeakSignalToNoiseRatioMetric, &diff7);
    MagickCompareImages(magick_wand_1, magick_wand_2, RootMeanSquaredErrorMetric, &diff8);
    MagickCompareImages(magick_wand_1, magick_wand_2, NormalizedCrossCorrelationErrorMetric, &diff8);
    MagickCompareImages(magick_wand_1, magick_wand_2, FuzzErrorMetric, &diff10);
    MagickCompareImages(magick_wand_1, magick_wand_2, UndefinedErrorMetric, &diff11);
    MagickCompareImages(magick_wand_1, magick_wand_2, PerceptualHashErrorMetric, &diff12);

    NSLog(@"UndefinedMetric: %.21f", diff1);
    NSLog(@"AbsoluteErrorMetric: %.21f", diff2);
    NSLog(@"MeanAbsoluteErrorMetric: %.21f", diff3);
    NSLog(@"MeanErrorPerPixelMetric: %.21f", diff4);
    NSLog(@"MeanSquaredErrorMetric: %.21f", diff5);
    NSLog(@"PeakAbsoluteErrorMetric: %.21f", diff6);
    NSLog(@"PeakSignalToNoiseRatioMetric: %.21f", diff7);
    NSLog(@"RootMeanSquaredErrorMetric: %.21f", diff8);
    NSLog(@"NormalizedCrossCorrelationErrorMetric: %.21f", diff9);
    NSLog(@"FuzzErrorMetric: %.21f", diff10);
    NSLog(@"UndefinedErrorMetric: %.21f", diff11);
    NSLog(@"PerceptualHashErrorMetric: %.21f", diff12);

    DestroyMagickWand(magick_wand_1);
    DestroyMagickWand(magick_wand_2);
    DestroyMagickWand(magick_wand_3);

    MagickWandTerminus();
}

5 observe output to the debugger (obviously, you'll need another method that uses some sort of "threshhold" monitor in order to determine what level shows an "exact or close to match" vs what you will consider a match yourself). Also, VERY IMPORTANT NOTE, the reason I have a "size_t" variable for size in the method inputs above is because you can't compare images of different sizes, so you must first resize the images you are comparing to a size that you feel is "reasonable" and then both images will be resized using ImageMagick to then compare the images:

Example 1:

[self compareTwoImages:[UIImage imageNamed:@"book.png"]
           secondImage:[UIImage imageNamed:@"book.png"]
        comparitorSize:32];

[76233:1364823] UndefinedMetric: 0.866871957624008593335

[76233:1364823] AbsoluteErrorMetric: 0.000000000000000000000

[76233:1364823] MeanAbsoluteErrorMetric: 0.000000000000000000000

[76233:1364823] MeanErrorPerPixelMetric: 0.000000000000000000000**

[76233:1364823] MeanSquaredErrorMetric: 0.000000000000000000000

[76233:1364823] PeakAbsoluteErrorMetric: 0.000000000000000000000

[76233:1364823] PeakSignalToNoiseRatioMetric: inf

[76233:1364823] RootMeanSquaredErrorMetric: 0.866871957624008593335

[76233:1364823] NormalizedCrossCorrelationErrorMetric: 0.000000000000000000000

[76233:1364823] FuzzErrorMetric: 0.000000000000000000000

[76233:1364823] UndefinedErrorMetric: 0.866871957624008593335

[76233:1364823] PerceptualHashErrorMetric: 0.000000000000000000000

Example 2:

[self compareTwoImages:[UIImage imageNamed:@"book.png"]
             secondImage:[UIImage imageNamed:@"arrow.png"]
          comparitorSize:32];

[76338:1368754] UndefinedMetric: 0.074585376822533272501

[76338:1368754] AbsoluteErrorMetric: 795.000000000000000000000

[76338:1368754] MeanAbsoluteErrorMetric: 0.314410045058480136504

[76338:1368754] MeanErrorPerPixelMetric: 328395.000000000000000000000

[76338:1368754] MeanSquaredErrorMetric: 0.245338692857198115149

[76338:1368754] PeakAbsoluteErrorMetric: 1.000000000000000000000

[76338:1368754] PeakSignalToNoiseRatioMetric: 6.102339529383479899138

[76338:1368754] RootMeanSquaredErrorMetric: 0.074585376822533272501

[76338:1368754] NormalizedCrossCorrelationErrorMetric: 0.000000000000000000000

[76338:1368754] FuzzErrorMetric: 0.571942529580490965913

[76338:1368754] UndefinedErrorMetric: 0.074585376822533272501

[76338:1368754] PerceptualHashErrorMetric: 1827.005561849247442296473

There's quite a bit going on here mathematically. I don't have time to explain all these variables, but suffice it to say, that this comparison between two images uses some very well known methods to compare two images. You'll have to pick it up from here and test out the statistics to customize this to your liking and to choose a threshold of error that suits your purposes.

Quick explanation:

ImageMagick is a battle tested image processing library, and while these methods listed above are sort of "black box" it this black boxness that allows you to save time vs moving to something that uses OpenCV. ImageMagick has already established some very good image processing algorithms and it's this history along with the out-of-the-box "this method works" that is one of the biggest benefits of ImageMagick considering what it would take for you to develop your own image recognition/processing library or methods. (by the way, I'm not affiliated with ImageMagick, I'm just very happy with the product, that's all)

The methods used in the Objective-C methods above are methods that stem from the ImageMagick Library for IOS. You'll have to read up on these methods but just so you know, they are written in C and not Objective-C which means that some of this is sort of foreign to the run of the mill Image I/O or other processing libraries. However, the only portion that I see as being hard to understand (assuming a person is new to C code) are how to understand what the "&" symbol is doing in front of some variables. Other than this, there's the issue of declaring and using structs, but this can easily be answered by using Stack Overflow, there's a lot of info on this right here.

I don't like the idea of you having to write your own library or image processing algorithms for an app like this. I think you are doing a very good thing for a person in need and I think it will help out a great number of people if you can lock down an MVP and I think that using ImageMagick is going to help you get there a lot sooner than needing to rewrite some of the stuff that ImageMagick already does for your own purposes.

One last thing to note, ImageMagick is built on some very low level C libraries. I would need to run a processing comparison of some sort to determine how well ImageMagick performs on IOS devices versus something like Image I/O. However, I have a feeling that these two image processing libraries share some of the same image processing functionality and speed. Correct me if anyone is certain that Image I/O is definately faster in tests, but I'm merely telling you this so you know that this ISN'T your average POD install, this is a robust piece of machinary that by the graces of some IOS devs they made an IOS version basically drop in using CoccoaPods. Beyond this, ImageMagick is used on all computing platforms and is largely a command line tool

Statistics, what the numbers mean:

Here's the resources you'll need to understand the math behind the statistics that I'm showing above in the debugger output:

This first one is old, but still relevant:

http://www.ict.griffith.edu.au/anthony/info/graphics/image_comparing

This one seems to be up to date:

http://www.imagemagick.org/Usage/compare/

Whether you choose to use this library or another, good luck to you, I like where you are going with this app and your demo looks amazing!

EDIT: I almost forgot to tell you the most important part here .. if the statistics return all "zeros" for everything except for "UndefinedMetric", "PeakSignalToNoiseRatioMetric", "RootMeanSquaredErrorMetric", and "UndefinedErrorMetric", then you MORE than likely have a MATCH!

Larry Pickles
  • 4,615
  • 2
  • 19
  • 36
  • Wow, thanks for the detailed answer. Will have a go and update you. – Ste Prescott Aug 25 '15 at 21:11
  • No problem, if you have questions just ask, it was sort of an adventure to get this code to fly correctly with IOS but it's locked down to the point right now where it should literally be a drop in for you if you follow the instructions above. – Larry Pickles Aug 25 '15 at 21:13
  • This is a great example of how to use ImageMagick's `compare` functions under iOS, so thank you for that and have a vote from me... but I fear that they are not the right tool as they compare pixel by pixel how exactly two images match, so any small change in shape, size, position, rotation, or any distortion will result in images not matching - try making two images and shifting one over by 3mm and this method will be very poor. I think a SIFT technique would be much better https://en.wikipedia.org/wiki/Scale-invariant_feature_transform – Mark Setchell Aug 26 '15 at 09:01
  • Thanks for the further information. I will look into the SIFT method. – Ste Prescott Aug 26 '15 at 21:59
  • @MarkSetchell If you provide a more detailed answer with some resources you found useful there is still the 100 rep bounty to give. There is only 11 hours left for the bounty. – Ste Prescott Aug 27 '15 at 08:03
  • One advantage that you may be able to benefit from is that you actually can detect the drawing of lines as they happen - so you know the direction and speed of the lines which must somehow be an advantage over algorithms which are presented with a *fait accompli* of a finished drawing. The most interesting paper I found is here... https://www.cs.ucsb.edu/~sherwood/pubs/SBIM-12-spark.pdf I feel ill-qualified to make a proper answer so I'll stick with comments. – Mark Setchell Aug 27 '15 at 15:45
2

I'd strongly recommend using bag of words model.

If you want to use a model like this to recognise say a bike, it would subdivide the image into it's parts (Saddle, weels, frame, steer). If it sees another image that contains two weels, saddle and a frame, it's likely to fit the picture to the bike (as quite a few of the words are alike). The problem with this approach is that the spatial relation of the words are not taken into account.

In this specific case I believe this would work reasonably well, as sketches tend to be a combination of many smaller (recognisable) objects. Instead of having to detect a bike all the time, you can re-use the smaller detection parts (say a wheel) for other things that need to be detected (motor cycle, car, etc).

edit: OpenCV even has an implementation for the bag of words model. It can be found here

Nallath
  • 2,100
  • 20
  • 37