0

In my project I am calculating HOG features on GPU for different levels in the same image. My aim is to detect the following objects.
1. Truck
2. Car
3. Person
Most important question is the selection of window size in case of multi class object detector. This post provide a very good base but it does not provide an answer for the selection of window size in case of multi class feature.
To solve this problem I calculated the HOG features of each positive image at different levels/resolution keeping the window size(48*96) same but the file for each image is around 600 MB which is too large.
Please let me know how to select the window size, block size and cell size in case of multi class object detection. Here is my code that I used to calculate the HOG features.

void App::run()
{
    unsigned int count = 1;
    FileStorage fs;
    running = true;

    //int width;
    //int height;

    Size win_size(args.win_width, args.win_width * 2); 
    Size win_stride(args.win_stride_width, args.win_stride_height);

    cv::gpu::HOGDescriptor gpu_hog(win_size, Size(16, 16), Size(8, 8), Size(8, 8), 9,
                                   cv::gpu::HOGDescriptor::DEFAULT_WIN_SIGMA, 0.2, gamma_corr,
                                   cv::gpu::HOGDescriptor::DEFAULT_NLEVELS);

    VideoCapture vc("/home/ubuntu/Desktop/getdescriptor/images/image%d.jpg");
    Mat frame;
    Mat Left;
    Mat img_aux, img, img_to_show, img_new;
    cv::Mat temp;
    gpu::GpuMat gpu_img, descriptors, new_img;

    char cbuff[20];



    while (running)
    {

        vc.read(frame);


        if (!frame.empty())
        {
            workBegin();

            width  = frame.rows;
            height = frame.cols;

            sprintf (cbuff, "%04d", count);

            // Change format of the image
            if (make_gray) cvtColor(frame, img_aux, CV_BGR2GRAY);
            else if (use_gpu) cvtColor(frame, img_aux, CV_BGR2BGRA);
            else Left.copyTo(img_aux);

            // Resize image
            if (args.resize_src) resize(img_aux, img, Size(args.width, args.height));
            else img = img_aux;
            img_to_show = img;

            gpu_hog.nlevels = nlevels;

            hogWorkBegin();
            if (use_gpu)
            {
                gpu_img.upload(img);
                new_img.upload(img_new);

                fs.open(cbuff, FileStorage::WRITE);


                for(int levels = 0; levels < nlevels; levels++)
                {
                gpu_hog.getDescriptors(gpu_img, win_stride, descriptors, cv::gpu::HOGDescriptor::DESCR_FORMAT_ROW_BY_ROW);
                descriptors.download(temp);

                //printf("size %d %d\n", temp.rows, temp.cols);

                fs <<"level" << levels;                
                fs << "features" << temp;

                cout<<"("<<width<<","<<height<<")"<<endl;

                width =  round(width/scale);
                height = round(height/scale);

                if( width < win_size.width || height < win_size.height )
                break;

                cout<<"Levels "<<levels<<endl;

                resize(img,img_new,Size(width,height));
                scale *= scale;
                }

                cout<<count<< " Image feature calculated !"<<endl;
                count++;
                //width = 640; height = 480;
                scale = 1.05;

            }

            hogWorkEnd();
            fs.release();
          }
           else  running = false;
       }
} 
Community
  • 1
  • 1
shah
  • 311
  • 3
  • 20

1 Answers1

2

The window size should be chosen, s.t. the object(s) you want to detect fit into the window. If you want to have different window sizes for different types this might become tricky.

Usually what you do is the following

  1. Take training data for each type of objects, and train [number of object types] many models using the features extracted at the known position of the objects.
  2. Then you take each test image and use a sliding window approach to extract features at each location. These features are then compared to each model. If one of the models lead to a score higher than a certain threshold you have found this object. If more than one model scores higher than the threshold simply take the one scoring highest.

If you want to use differently sized detection windows you will get feature vectors of different size (by nature of the HoG features). The tricky thing is, that in the testing phase you have to use as many sliding windows as object types you use. This would definitely work, but you have to process each testing image several times leading to higher processing time)

To answer your question of the sizes: There is no value I can give you, it always depends on your images. Using an image pyramid as you mentioned above is a good way to deal with differently scaled objects.

  • window size: the whole object should fit in; has to be divisible by block size
  • block size has to be divisible by cell size

Sample code for visualization of HoG features can be found here. This also helps understand how the feature vectors look like.

EDIT: Found out the hard way, that only cv::Size(8,8) is allowed for cell size. See documentation.

LSA
  • 404
  • 7
  • 11
  • 1
    Thank you so much for an amazing answer. I used image pyramid so that I don't have to calculate HOG features according different object(s). Can you please tell is this right approach or I have to calculate HOG features for different object with different window size – shah Jun 29 '15 at 12:37
  • 1
    No, the image pyramid is there to model different scalings of the objects. If you have an image of a car close to you it is bigger (in terms of pixels) than a car far away. You would still need different window Sizes (if the sizes of the objects differ much, as in you case; a car's width is much larger than a persons'). So you would take a detection window of 64/128 for persons and 128/128 for cars (just some numbers, the real values really depend on you images) – LSA Jun 29 '15 at 12:40
  • I have training images of resolution 1224 * 370 and if I calculate the HOG feature of each image with different levels. I get yml file of around 800 MB. I believe I should resize my image. What you recommend. – shah Jun 29 '15 at 12:42
  • Thank you I got your point regarding image pyramid. I believe it would be interesting how many level should I choose. – shah Jun 29 '15 at 12:44
  • This again depends on your data. What is the maximal and minimal scale of your objects? – LSA Jun 29 '15 at 12:55
  • I have corresponding label file for each training image which contain information about each object(s). So I believe I can get this info. – shah Jun 29 '15 at 12:58
  • 1
    Uh, just had to learn the hard way: Cell size has to be cv::Size(8,8), other sizes are currently not supported, see [documentation](http://docs.opencv.org/modules/gpu/doc/object_detection.html#gpu-hogdescriptor-hogdescriptor) – LSA Jun 30 '15 at 09:28