It is said "4916 positive training examples were hand picked aligned, normalized, and scaled to a base resolution of 24x24. 10,000 negative examples were selected by randomly picking sub-windows from 9500 images which did not contain faces." In the paper "Robust Real-Time Face Detection by Paul Viola & Michael Jones"
My question is what do they mean about hand picked aligned, normalized, and scaled to a base resolution of 24x24?
Does "hand picked aligned" mean they have 4916 positive images of 4916 different faces? Does "normalized" mean each of the 4916 images have the same features[file size, file type, picture color(gray scale/colored)]? Does "scaled to a base resolution of 24x24" mean each of the 4916 images are re-sized to 24x24 pixels?
Thanks for your time!