2

It is said "4916 positive training examples were hand picked aligned, normalized, and scaled to a base resolution of 24x24. 10,000 negative examples were selected by randomly picking sub-windows from 9500 images which did not contain faces." In the paper "Robust Real-Time Face Detection by Paul Viola & Michael Jones"

My question is what do they mean about hand picked aligned, normalized, and scaled to a base resolution of 24x24?

Does "hand picked aligned" mean they have 4916 positive images of 4916 different faces? Does "normalized" mean each of the 4916 images have the same features[file size, file type, picture color(gray scale/colored)]? Does "scaled to a base resolution of 24x24" mean each of the 4916 images are re-sized to 24x24 pixels?

Thanks for your time!

amit
  • 175,853
  • 27
  • 231
  • 333
Koji Ikehara
  • 117
  • 2
  • 9
  • 4
    I suggest looking at the presentation http://www.cs.stevens.edu/~lxu1/CS559_data/FaceDetection_final.pdf - it nicely describes Viola Jones training process and how to make it better. – Lyth Dec 06 '12 at 07:09

1 Answers1

2

Does "hand picked aligned" mean they have 4916 positive images of 4916 different faces?

Not necceseraly distinct - but yes, they gave 4916 different photos of faces. The faces were found manually by a "human expert".

Does "normalized" mean each of the 4916 images have the same features[file size, file type, picture color(gray scale/colored)]?

They only used a grey-scale pixels, normalized means they made sure there is no "black" and "white" pictures. If a picture was very dark - it was automatically brightened, and if it was not dark enough - it was darkened. This is done by an automatic component easily.

Does "scaled to a base resolution of 24x24" mean each of the 4916 images are re-sized to 24x24 pixels?

Yes, they made sure each "face" is exactly 24x24 pixels by applying some processing on the picture.

amit
  • 175,853
  • 27
  • 231
  • 333
  • Thanks a lot amit! You helped me a lot. I just have a few more questions. How did they normalized each grey-scale pixels? What automatic component did they use? Lastly, what kind of training did they use to produce the cascaded file(xml file)? I've been following a lot of haartraining tutorials but not yet able to output a good detector(xml). – Koji Ikehara Dec 07 '12 at 06:43
  • @KojiIkehara: I am not that amit. Usually, for new questions - it is best to put them as a new threads (unless these are clarifications on the suggested answer) – amit Dec 07 '12 at 06:53
  • I will be posting another thread. I hope you can answer my other questions. Thanks! – Koji Ikehara Dec 07 '12 at 08:38
  • Hi can you take a look at my new tag here's the link http://stackoverflow.com/questions/13835311/viola-jones-image-normalization . Thanks! – Koji Ikehara Dec 12 '12 at 08:33