4

I have been trying to implement the meanshift algorithm for tracking objects, and have gone through the concepts involved.

As per now I have managed to successfully generate a backprojected stream from my camera with a single channel hue roi histogram and a single channel hue video stream which seems fine, I know there is a meanshift function within the opencv library but I am trying to implement one myself using the data structures provided in opencv, calculating the moments and computing the mean centroid of the search window.

But for some reason I am unable to locate the problem within my code as it keeps on converging to the upper left corner of my video stream for any input roi (region of interest) to be tracked. Following is a code snippet of the function for calculating the centroid of the search window where I feel the problem lies but not sure what it is, I would really appreciate if someone can point me in the right direction:

void moment(Mat &backproj, Rect &win){

    int x_c, y_c, x_c_new, y_c_new;    
    int idx_row, idx_col;
    double m00 = 0.0 , m01 = 0.0 , m10 = 0.0 ;
    double res = 1.0, TOL = 0.003 ;

    //Set the center of search window as the center of the probabilistic image:
    y_c =  (int) backproj.rows / 2 ; 
    x_c =  (int) backproj.cols / 2 ; 

    //Centroid search solver until residual below certain tolerance:
    while (res > TOL){

        win.width = (int) 80; 
        win.height = (int) 60; 

        //First array element at position (x,y) "lower left corner" of the search window:
        win.x = (int) (x_c - win.width / 2) ;
        win.y = (int) (y_c - win.height / 2); 

        //Modulo correction since modulo of negative integer is negative in C:
        if (win.x < 0)
                win.x = win.x % backproj.cols + backproj.cols ;

        if (win.y < 0)
                win.y = win.y % backproj.rows + backproj.rows ;   

        for (int i = 0; i < win.height; i++ ){  

                //Traverse along y-axis (height) i.e. rows ensuring wrap around top/bottom boundaries:                  
                idx_row = (win.y + i) % (int)backproj.rows ;

                for (int j = 0; j < win.width; j++ ){

                        //Traverse along x-axis (width) i.e. cols ensuring wrap around left/right boundaries:
                        idx_col = (win.x + j) % (int)backproj.cols ;    
                        //Compute Moments:                            
                        m00 += (double) backproj.at<uchar>(idx_row, idx_col) ;
                        m10 += (double) backproj.at<uchar>(idx_row, idx_col) * i ;
                        m01 += (double) backproj.at<uchar>(idx_row, idx_col) * j ;
                }
        }

        //Compute new centroid coordinates of the search window:
        x_c_new = (int) ( m10 / m00 ) ;
        y_c_new = (int) ( m01 / m00 );

        //Compute the residual:
        res = sqrt( pow((x_c_new - x_c), 2.0) + pow((y_c_new - y_c), 2.0) ) ;

        //Set new search window centroid coordinates:
        x_c = x_c_new;
        y_c = y_c_new;
    }
}

It's my second ever query on stackoverflow so please excuse me for any guidelines that I forgot to follow.

EDIT

changed m00 , m01 , m10 to block level variables within WHILE-LOOP instead of function level variables, thanks to Daniel Strul for pointing it out but the problem still remains. Now the search window jumps around the frame boundaries instead of focusing on the roi.

    void moment(Mat &backproj, Rect &win){

    int x_c, y_c, x_c_new, y_c_new;    
    int idx_row, idx_col;
    double m00 , m01 , m10 ;
    double res = 1.0, TOL = 0.003 ;

    //Set the center of search window as the center of the probabilistic image:
    y_c =  (int) backproj.rows / 2 ; 
    x_c =  (int) backproj.cols / 2 ; 

    //Centroid search solver until residual below certain tolerance:
    while (res > TOL){

        m00 = 0.0 , m01 = 0.0 , m10 = 0.0
        win.width = (int) 80; 
        win.height = (int) 60; 

        //First array element at position (x,y) "lower left corner" of the search window:
        win.x = (int) (x_c - win.width / 2) ;
        win.y = (int) (y_c - win.height / 2); 

        //Modulo correction since modulo of negative integer is negative in C:
        if (win.x < 0)
                win.x = win.x % backproj.cols + backproj.cols ;

        if (win.y < 0)
                win.y = win.y % backproj.rows + backproj.rows ;   

        for (int i = 0; i < win.height; i++ ){  

                //Traverse along y-axis (height) i.e. rows ensuring wrap around top/bottom boundaries:                  
                idx_row = (win.y + i) % (int)backproj.rows ;

                for (int j = 0; j < win.width; j++ ){

                        //Traverse along x-axis (width) i.e. cols ensuring wrap around left/right boundaries:
                        idx_col = (win.x + j) % (int)backproj.cols ;    
                        //Compute Moments:                            
                        m00 += (double) backproj.at<uchar>(idx_row, idx_col) ;
                        m10 += (double) backproj.at<uchar>(idx_row, idx_col) * i ;
                        m01 += (double) backproj.at<uchar>(idx_row, idx_col) * j ;
                }
        }

        //Compute new centroid coordinates of the search window:
        x_c_new = (int) ( m10 / m00 ) ;
        y_c_new = (int) ( m01 / m00 );

        //Compute the residual:
        res = sqrt( pow((x_c_new - x_c), 2.0) + pow((y_c_new - y_c), 2.0) ) ;

        //Set new search window centroid coordinates:
        x_c = x_c_new;
        y_c = y_c_new;
    }
}
Ragesam
  • 43
  • 5

1 Answers1

1

The reason your algorithms always converges to the upper left corner independently of the input data is that m00, m10 and m01 are never reset to zero:

  • On iteration 0, for each moment variable m00, m10 and m01, you compute the right value m0
  • Between iteration 0 and iteration 1 , the moments variables are not reset and keep their previous value
  • Thus, on iteration 1, for each moment variable m00, m10 and m01, you actually sum the new moment with the old one and obtain ( m0 + m1 )
  • On iteration 2, you carry on summing the new moments on top of the previous ones and obtain ( m0 + m1 + m2 )
  • And so on, iteration by iteration.

At the very least, the moment variables should be reset at the beginning of each iteration.

Ideally, they should not be function-level variables but should rather be block-level variables, as they have no use outside the loop iterations (except for debugging purpose):

while (res > TOL){
    ...
    double m00 = 0.0, m01 = 0.0, m10 = 0.0;
    for (int i = 0; i < win.height; i++ ){
        ...

EDIT 1

The reason for the second problem you encounter (the ROI jumping all around the place) is that the computations of the moments are based on the relative coordinates i and j.

Thus, what you compute is [ avg(j) , avg(i) ], wher as what you really want is [ avg(y) , avg(x) ]. To solve this issue, I had proposed a first solution. I"ve replaced it by a much simpler solution below.

EDIT 2 The simplest solution is to add the coordinates of the ROI corner right at the end of each iteration:

    x_c_new = win.x + (int) ( m10 / m00 ) ;
    y_c_new = win.y + (int) ( m01 / m00 );
Daniel Strul
  • 1,458
  • 8
  • 16
  • Thanks for pointing another error out I thought it should work after that but now the search window jumps around the edges instead of focusing on the roi. Well the algorithm is supposed to find the probability distribution modes within the backprojected image and center the search window onto this mode. – Ragesam Oct 27 '15 at 09:22
  • Ok, I'll have a look. As a good practice on Stack Overflow, you should rather leave the original code as it was. Then, below it, you add **EDIT** and append the description and the corrected code of the new problem that has come up. This way, the answers remain consistent with the evolution of the question – Daniel Strul Oct 27 '15 at 09:53
  • Oops my bad :P thanks for the guidance Daniel. Should I leave it as it is now or re-edit it ? Moreover, I am not sure if I can post a git link to the whole project here? since I feel it will be helpful for anyone who wants to run the whole code. – Ragesam Oct 27 '15 at 12:36
  • @Ragesam It's really better if you re-edit. For now, I have not checked the new issue, just clarified my previous answer. As for the the rules for posting github links, I don't know the rules really. I actually need some more information to proceed, but not the full project: - what is the structure of `Mat` and `Rect`? Is it possible to show a small data sample of `backproj` (10 points or so) and the values of `win`? Thx – Daniel Strul Oct 27 '15 at 12:43
  • I hope it looks much better now :) , Mat (opencv data type) is a single channel hue array with backprojected probability distribution values between 0,255 i.e. values closer to 255 within the video stream areas where the hue histogram of roi matches the video stream hue otherwise it has a value of 0. Rect (also an opencv data type) defines a rectangular region in my case it is represented by the variable 'win' where (win.x, win.y) = (0,0) coordinates of the rectangle more like a 2-D array and win.width x win.height is the size of the array. I hope that helps I'm quite bad at explaining – Ragesam Oct 27 '15 at 13:39
  • @Ragesam Yes, thanks! I've updated my answer to explain your "window-jumping-all-around-the-place behavior, a simple correction should fix it. – Daniel Strul Oct 27 '15 at 13:49
  • there is a correction in the last comment (win.x, win.y) does not equal (0,0) but represent the lower left corner global coordinates of the search windows within the backprojected image. – Ragesam Oct 27 '15 at 14:08
  • @Ragesam Yes, I had guessed that much. that's why you need to integrate [win.x, win.y] at some point in your computation.. Right now, you are not computing [ average(x), average(y) ], you are computing [ average(i), average(j) ]. Thus are offset by an error [ deltax , deltay ] = [ average(x), average(y) ] - [ average(i), average(j) ] = [ -win.x, -win.y ] – Daniel Strul Oct 27 '15 at 14:49
  • Done, I've rewritten the answer to provide you with a better solution (simpler but also more efficient and more accurate) – Daniel Strul Oct 27 '15 at 15:02
  • 1
    Thanks a lot, I can not imagine I overlooked working in global image coordinates. In any case you helped me find not just one bug but 3 of them :) and it's working quite fine now. – Ragesam Oct 27 '15 at 20:13