What algorithms or approaches apart from Haar cascades could be used for custom objects detection?

Question

I need to do computer visions tasks in order to detect watter bottles or soda cans. I will obtain 'frontal' images of bottles, soda cans or any other random objects (one by one) and my algorithm should determine whether it's a bottle, a can or any of them.

Some details about object detecting scenario:

As mentioned, I will test one single object per image/video frame.
Not all watter bottles are the same. There could be color in plastic, lid or label variation. Maybe some could not get label or lid.
Same about variation goes for soda cans. No wrinkled soda cans are gonna be tested though.
There could be small size variation between objects.
I could have a green (or any custom color) background.
I will do any needed filters on image.
This will be run on a Raspberry Pi.

Just in case, an example of each:

enter image description here

I've tested a couple times OpenCV face detection algorithms and I know it works pretty good but I'd need to obtain an special Haar Cascades features XML file for detecting each custom object on this approach.

So, the distinct alternatives I have in mind are:

I'd like to get a simple algorithm and I think creating a custom Haar classifier could be even not needed. What would you suggest?

Update

I strongly considered the shape/aspect ratio approach.

However I guess I'm facing some issues as bottles come in distinct sizes or even shapes each. But this made me think or set following considerations:

I'm applying a threshold with THRESH_BINARY method. (Thanks to the answers).
I will use a white background on detection.
Soda cans are all same size.
So, a bounding box for soda cans with high accuracy might distinguish a can.

What I've achieved:

Threshold really helped me, I could notice that on white background tests I would obtain for cans:

enter image description here

And this is what it's obtained for bottles:

enter image description here

So, darker areas left dominancy is noticeable. There are some cases in cans where this might turn into false negatives. And for bottles, light and angle may lead to not consistent results but I really really think this could be a shorter approach.

So, I'm quite confused now how I should evaluate that darkness dominancy, I've read that findContours leads to it but I'm quite lost on how to seize such function. For example, in case of soda cans, it may find several contours, so I get lost on what to evaluate.

Note: I'm open to test any other algorithms or libraries distinct to Open CV.

Histogram of oriented Gradients + SVM? – Micka Apr 01 '15 at 19:07 — Micka, Apr 01 '15 at 19:07

score 2 · Accepted Answer · edited May 23 '17 at 12:08

I see few basic ideas here:

Check object (to be precise - object boundind rect) width/height ratio. For can it's approimetely 2-2.5, for bottle i think it will be >3. It's very simple idea to it should be easy to test it quickly and i think it should has quite good accuracy. For some values, like 2.75 (assumimg that values that i gave are correct, which most likely isn't true) you can use some different algorithm.
Check whether you object contains glass/transparence regions - if yes, than definitely it's a bottle. Here you can read more about it.
Use grabcut algorithm to get object mask/more precise shape and check whether this shape width at the top is similar to width at the bottom - if yes than it's a can, no - bottle (bottles has screw cap at the top).

Thanks, I'm quite new to image processing and Open CV, could you suggest some specific useful functions for each idea? — diegoaguilar, Apr 02 '15 at 14:19

dirtbag · Answer 2 · 2015-04-03T18:16:47.537

Since you want to recognize can vs bottle rather than pepsi vs coke, shape matching is probably the way to go when compared to Haar and the features2d matchers like SIFT/SURF/ORB

A unique background color will make things easier.

First create a histogram from an image of just the background

int channels[] = {0,1,2}; // use all the channels
int rgb_bins = 32; // quantize to 32 colors per channel
int histSize[] = {rgb_bins, rgb_bins, rgb_bins};
float _range[] = {0,255};
float* ranges[] = {_range, _range, _range};

cv::SparseMat bghist;
cv::calcHist(&bg_image, 1, channels, cv::noArray(),bghist, 3, histSize, ranges );

Then use calcBackProject to create a mask of bg and not bg

cv::MatND temp_ND;
cv::calcBackProject( &bottle_image, 1, channels, bghist, temp_ND, ranges );

cv::Mat bottle_mask, bottle_backproj;
if( feeling_lazy ){
    cv::normalize(temp_ND, bottle_backproj, 0, 255, cv::NORM_MINMAX, CV_8U);
    //a small blur here could work nicely
    threshold( bottle_backproj, bottle_mask, 0, 255, THRESH_OTSU );
    bottle_mask = cv::Scalar(255) - bottle_mask; //invert the mask
} else {
    //finding just the right value here might be better than the above method
    int magic_threshold = 64; 
    temp_ND.convertTo( bottle_backproj, CV_8U, 255.); 
    //I expect temp_ND to be CV_32F ranging from 0-1, but I might be wrong.
    threshold( bottle_backproj, bottle_mask, magic_threshold, 255, THRESH_BINARY_INV );
}

Then either:

Compare bottle_mask or bottle_backproj to a few sample bottle masks/backprojections using matchTemplate with a threshold on confidence to decide if it's a match.

matchTemplate(bottle_mask, bottle_template, result, CV_TM_CCORR_NORMED);
double confidence; minMaxLoc( result, NULL, &confidence);

Or use matchShapes, though I've never gotten this to work properly.

double confidence = matchShapes(bottle_mask, bottle_template, CV_CONTOURS_MATCH_I3);

Or use linemod which is difficult to set up but works great for images like this where the shape isn't very complex. Aside from the linked file, I haven't found any working samples of this method so here's what I did.

First create/train the detector with some sample images

//some magic numbers
std::vector<int> T_at_level;
T_at_level.push_back(4); 
T_at_level.push_back(8);

//add some padding so linemod doesn't scream at you
const int T = 32;
int width = bottle_mask.cols;
if( width % T != 0)
    width += T - width % T;

int height = bottle_mask.rows;
if( height % T != 0)
    height += T - height % T;

//in this case template_backproj is created specifically from a sample bottle_backproj
cv::Rect padded_roi( (width - template_backproj.cols)/2, (height - template_backproj.rows)/2, template_backproj.cols, template_backproj.rows);
cv::Mat padded_backproj = zeros( width, height, template_backproj.type());
padded_backproj( padded_roi ) = template_backproj;

cv::Mat padded_mask = zeros( width, height, template_mask.type());
padded_mask( padded_roi ) = template_mask; 
//you might need to erode padded_mask by a few pixels.

//initialize detector
std::vector< cv::Ptr<cv::linemod::Modality> > modalities;
modalities.push_back( cv::makePtr<cv::linemod::ColorGradient>() ); //for those that don't have a kinect
cv::Ptr<cv::linemod::Detector> new_detector = cv::makePtr<cv::linemod::Detector>(modalities, T_at_level);

//add sample images to the detector
std::vector<cv::Mat> template_images;
templates.push_back( padded_backproj);
cv::Rect ignore_me;
const std::string class_id = "bottle";
template_id = new_detector->addTemplate(template_images, class_id, padded_mask, &ignore_me);

Then do some matching

std::vector<cv::Mat> sources_vec;
sources_vec.push_back( padded_backproj );
//padded_backproj doesn't need to be the same size as the trained template images, but it does need to be padded the same way.
float matching_threshold = 0.8; //a higher number makes the algorithm faster
std::vector<cv::linemod::Match> matches;
std::vector<cv::String> class_ids;

new_detector->match(sources_vec, matching_threshold, matches,class_ids);
float confidence = matches.size() > 0? matches[0].similarity : 0;

Some more links, [calcBackProject](http://docs.opencv.org/doc/tutorials/imgproc/histograms/back_projection/back_projection.html) and [matchShapes](http://docs.opencv.org/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html?#matchshapes) — dirtbag, Apr 03 '15 at 17:45

Joseph Howse · Answer 3 · 2015-04-17T18:09:07.320

As cyriel suggests, the aspect ratio (width/height) might be one useful measure. Here is some OpenCV Python code that finds contours (hopefully including the outline of the bottle or can) and gives you aspect ratio and some other measurements:

    # src image should have already had some contrast enhancement (such as
    # cv2.threshold) and edge finding (such as cv2.Canny)
    contours, hierarchy = cv2.findContours(src, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    for contour in contours:
    num_points = len(contour)
    if num_points < 5:
        # The contour has too few points to fit an ellipse. Skip it.
        continue

    # We could use area to help determine the type of object.
    # Small contours are probably false detections (not really a whole object).
    area = cv2.contourArea(contour)

    bounding_ellipse = cv2.fitEllipse(contour)
    center, radii, angle_degrees = bounding_ellipse

    # Let's define an ellipse's normal orientation to be landscape (width > height).
    # We must ensure that the ellipse's measurements match this orientation.
    if radii[0] < radii[1]:
        radii = (radii[1], radii[0])
        angle_degrees -= 90.0

    # We could use the angle to help determine the type of object.
    # A bottle or can's angle is probably approximately a multiple of 90 degrees,
    # assuming that it is at rest and not falling.

    # Calculate the aspect ratio (width / height).
    # For example, 0.5 means the object's height is 2 times its width.
    # A bottle is probably taller than a can.
    aspect_ratio = radii[0] / radii[1]

For checking transparency, you can compare the picture to a known background using histogram analysis or background subtraction.

The contour's moments can be used to determine its centroid (center of gravity):

    moments = cv2.moments(contour)
    m00 = moments['m00']
    m01 = moments['m01']
    m10 = moments['m10']
    centroid = (m10 / m00, m01 / m00)

You could compare this to the center. If the object is bigger ("heavier") on one end, the centroid will be closer to that end than the center is.

I tried loading distinct images but it's getting less than 5 points, this is what I did: https://gist.github.com/diegoaguilar/cb008e2a6037b3a71c1a — diegoaguilar, Apr 14 '15 at 22:04
Before using cv2.findContours, we should apply some contrast enhancement (such as cv2.threshold to convert the image to binary black and white) and edge detection (such as cv2.Canny). [Edited above.] — Joseph Howse, Apr 17 '15 at 18:09

score 1 · Answer 4 · edited Jun 20 '20 at 09:12

So, my main approach for detection was:

Bottles are transparent and cans are opaque

Generally algorithm consisted in:

Take a grayscale picture.

Apply a binary threshold.

Select a convenient ROI from it.

Obtain it's color mean and even the standard deviation.

Distinguish.

Implementation was basically reduced to this function (where CAN and BOTTLE were previously defined):

int detector(int x, int y, int width, int height, int thresholdValue, CvCapture* capture) {

  Mat img;
  Rect r;
  vector<Mat> channels;
  r = Rect(x,y,width,height);

  if ( !capture ) {
        fprintf( stderr, "ERROR: capture is NULL \n" );
        getchar();
        return -1;
                   }

  img = Mat(cvQueryFrame( capture ));
  cvtColor(img,img,CV_RGB2GRAY);
  threshold(img, img, 127, 255, THRESH_BINARY);

  // ROI
  Mat roiImage = img(r);
  split(roiImage,  channels);
  Scalar m = mean(channels[0]);
  float media = m[0];
  printf("Media: %f\n", media);

  if (media < thresholdValue) {

    return CAN;
  }

  else {
    return BOTTLE;
  }
}

As it can be seen, a THRESH_BINARY threshold was applied, and it was a plain white background which was used. However the main and critical issue I faced with this whole approach and algorithm was luminosity changes in environment, even minor ones.

Sometimes I could notice a THRESH_BINARY_INV might help more, but I wonder if I could use some certian threshold parameters or wether applying other filters may lead to getting rid of environment lightning as an issue.

I really appreciate the aspect ratio calculation approach from bounding box or finding contours but I found this straight forward and simple when conditions were adjusted.

score 0 · Answer 5 · answered Sep 18 '18 at 20:30

I'd use deep learning, based on Transfer learning.

The idea is this: given a highly complex well trained neural network, that was trained on a similar classification task (tipically over a large public dataset, like imagenet), you can freeze the majority of its weigths and only train the last layers. There are lots of tutorials out there. You don't need to have a background on deep learning.

There is a tutorial which is almost out of the box with tensorflow here and here there is another based on keras.

What algorithms or approaches apart from Haar cascades could be used for custom objects detection?

Update

5 Answers5