Object Recognition by Outlines vs Features

Question

Context:

I have the RGB-D video from a Kinect, which is aimed straight down at a table. There is a library of around 12 objects I need to identify, alone or several at a time. I have been working with SURF extraction and detection from the RGB image, preprocessing by downscaling to 320x240, grayscale, stretching the contrast and balancing the histogram before applying SURF. I built a lasso tool to choose among detected keypoints in a still of the video image. Then those keypoints are used to build object descriptors which are used to identify objects in the live video feed.

Problem:

SURF examples show successful identification of objects with a decent amount of text-like feature detail eg. logos and patterns. The objects I need to identify are relatively plain but have distinctive geometry. The SURF features found in my stills are sometimes consistent but mostly unimportant surface features. For instance, say I have a wooden cube. SURF detects a few bits of grain on one face, then fails on other faces. I need to detect (something like) that there are four corners at equal distances and right angles. None of my objects has much of a pattern but all have distinctive symmetric geometry and color. Think cellphone, lollipop, knife, bowling pin. My thought was that I could build object descriptors for each significantly different-looking orientation of the object, eg. two descriptors for a bowling pin: one standing up and one laying down. For a cellphone, one laying on the front and one on the back. My recognizer needs rotational invariance and some degree of scale invariance in case objects are stacked. Ability to deal with some occlusion is preferable (SURF behaves well enough) but not the most important characteristic. Skew invariance would be preferable and SURF does well with paper printouts of my objects held by hand at a skew.

Questions:

Am I using the wrong SURF parameters to find features at the wrong scale? Is there a better algorithm for this kind of object identification? Is there something as readily usable as SURF that uses the depth data from the Kinect along with or instead of the RGB data?

score 2 · Answer 1 · edited May 23 '17 at 12:16

2

I was doing something similar for a project, and ended up using a super simple method for object recognition, which was using OpenCV blob detection, and recognizing objects based on their areas. Obviously, there needs to be enough variance for this method to work.

You can see my results here: http://portfolio.jackkalish.com/Secondhand-Stories

I know there are other methods out there, one possible solution for you could be approxPolyDP, which is described here: How to detect simple geometric shapes using OpenCV

Would love to hear about your progress on this!

edited May 23 '17 at 12:16

Community

1
1

answered Feb 06 '14 at 02:46

JackKalish

1,555
2
15
24

Jack, thanks for your response and examples. I'm now taking advantage of the depth image and working on a recognizer using some established descriptors that use histograms of the surface normals around selected keypoints. New computer, new libraries and new system from the ground up :) https://github.com/jbeuckm/kinect_pcl_osc_qt – Joe Beuckman Feb 18 '14 at 01:22
1

Cool! Have you seen this? http://web.missouri.edu/hantx/paper/Tang_Wang_Lv_Han_accv12.pdf – JackKalish Feb 19 '14 at 03:29
SHOT and other histogram descriptors are very interesting and very CPU intensive. I'm using SHOT Color descriptors but prioritizing which objects I attempt to match by running cv::matchShapes with a set of recorded contour captures. Contour matching on the thresholded depth image is showing a lot of promise. It's actually as good as or better than SHOT for my objects and runs 1000's of times faster. – Joe Beuckman Mar 05 '14 at 06:32

Object Recognition by Outlines vs Features

1 Answers1