I'm currently training an image classifier using Nvidia DIGITS. I'm downloading 1,000,000 images as part of the ILSVRC12 dataset. As you may know, this dataset consists of 1,000 classes, with 1,000 images per class. The problem is that a lot of the images are downloaded from dead Flickr URLs, thus populating a decent portion of my dataset (about 5-10%) with the generic "unavailable" image shown below. I plan on going through and deleting each copy of this "generic" image, thus leaving my dataset with only images relevant to each class.
This action would make the size of the classes uneven. They would no longer contain 1,000 images each. They would contain between 900-1,000 images each. Does the size of each class have to be equal? In other words, can I delete these generic images without affecting the accuracy of my classifier? Thanks in advance for you feedback.