2

I want to train my classifier with some images, some of which have different dimensions.

They all fall under the following dimensions:

  • 100x50
  • 50x100
  • 64x72
  • 72x64

However, with 9 orientation bins, and 8 pixels per cell, each of these generates 648 HoG features.

I actually chose all images to be of one of these sizes so that they would end up having the same number of HoG features so that training is uniform.

The reason I opted for this is because the object of interest in the training images sometimes has a different aspect ratio, hence cropping all the images the same size for some of the images left too much background in there.

Now my question is - does it matter what the aspect ratio/image dimensions of the training images are, as long as the number of HoG features is consistent? (My training algorithm only takes in the HoG features).

user961627
  • 12,379
  • 42
  • 136
  • 210

1 Answers1

4

If your HOG features all use 8x8 cells, then how can you get the same size vector for different size image? Wouldn't you have more cells in a larger image?

Generally, if you want to use HOG, you should resize all images to be the same size.

Another question: do you just want to classify the images that are already cropped, or do you want to detect objects in a large scene? If you just want to classify, then the variation in the aspect ratio may be a problem. On the other hand, if you want to do sliding-window object detection, the variation in the aspect ration is a much bigger problem. You may have to break your category into sub-classes based on the aspect ratio, and train a separate detector for each one.

Edit: Sorry, but getting the HOG vectors to be the same length by using roundoff errors and differences in aspect ratio is cheating. :) The whole point is to have the HOG cells encode spatial information. The corresponding cells must encode the same spot in different images. Otherwise you are comparing apples and oranges.

As far as object detection, aspect ratio is paramount. You would be sliding a window over the image, and that window had better have the same aspect ratio as the objects you are trying to detect. Otherwise, it simply won't work. So if you have these 4 distinct aspect ratios, your best bet is to train 4 different detectors.

Dima
  • 38,860
  • 14
  • 75
  • 115
  • 1
    To your first question - not really. For eg, with a 50x100 image, and 9 orientations with 8x8 cells, and 1 cell per block, I end up with `floor(50/8) * floor (100/8) * 9 = 648` HoG features. For a 64x72 image, it's `floor(64/8) * floor (72/8) * 9 = 648`, again. I have to do both things you mentioned. 1) I have to do the cropping myself for training and test images, and then just classify cropped images, and 2) I have to detect an object in a large scene too. I thought doing it this way (same number of HoG features for all images, regardless of their dimensions) would cater to both issues. – user961627 Jul 11 '14 at 18:18
  • sort of a short follow-up question: http://stackoverflow.com/users/961627/user961627 – user961627 Aug 07 '14 at 10:25