I'm using CoreML to train my own Model for animals. Reading the Apple Docs it says
Use at least 10 images per label for the training set, but more is always better. Also, balance the number of images for each label. For example, don’t use 10 images for Cheetah and 1000 images for Elephant.
I'm using a python script to download up to 1000 images per data set (1000 bears, 1000 cheetahs, 1000 elephants, etc). What I notice is sometimes I get 400 images of one thing, 700 of another, 900 another etc
animals
-bears (402 pics)
-cheetahs (810 pics)
-elephants (420 pics)
-lions (975 pics)
-tigers (620 pics)
-zebras (793)
To download the images via terminal I type in:
// python image_download_python2.py <query> <number of images>
python image_download_python2.py 'elephants' '1000'
Because it returns some data sets with images of 400, others 700, and others with 900 etc would this still be considered "balanced out" or do I need to set a down limit of 500 when I run the python script so that everything hovers around 500 images no matter what?
python image_download_python2.py 'elephants' '500'
I'm pretty much sure I'll always get at least 400 images.
Keep in mind the docs says more images is always better
Use at least 10 images per label for the training set, but more is always better.
On a side note, what happens to the CoreML model while training when the data sets aren't balanced out like in Apple's example of 10 cheetahs and 1000 elephants?