0

I'm using CoreML to train my own Model for animals. Reading the Apple Docs it says

Use at least 10 images per label for the training set, but more is always better. Also, balance the number of images for each label. For example, don’t use 10 images for Cheetah and 1000 images for Elephant.

I'm using a python script to download up to 1000 images per data set (1000 bears, 1000 cheetahs, 1000 elephants, etc). What I notice is sometimes I get 400 images of one thing, 700 of another, 900 another etc

animals
  -bears (402 pics)
  -cheetahs (810 pics)
  -elephants (420 pics)
  -lions (975 pics)
  -tigers (620 pics)
  -zebras (793)

To download the images via terminal I type in:

// python image_download_python2.py <query> <number of images>
python image_download_python2.py 'elephants' '1000'

Because it returns some data sets with images of 400, others 700, and others with 900 etc would this still be considered "balanced out" or do I need to set a down limit of 500 when I run the python script so that everything hovers around 500 images no matter what?

python image_download_python2.py 'elephants' '500'

I'm pretty much sure I'll always get at least 400 images.

Keep in mind the docs says more images is always better

Use at least 10 images per label for the training set, but more is always better.

On a side note, what happens to the CoreML model while training when the data sets aren't balanced out like in Apple's example of 10 cheetahs and 1000 elephants?

Lance Samaria
  • 17,576
  • 18
  • 108
  • 256
  • I think you should try from 100 and test it, then try 200 and test it – canister_exister Dec 10 '18 at 18:24
  • @canister_exister hey there, why do you think that? – Lance Samaria Dec 10 '18 at 18:25
  • Because my coreML work good with 100 images, so if your coreML learned from 402 pics of bears. You can try 400 for all categories – canister_exister Dec 10 '18 at 18:27
  • @canister_exister only 100 images, wow that’s good. Do you mind me asking what is your training model? I’m actually loading in thousands of images, I already started and I’m at about 10k or so. I’m using it to prevent uploads. For example my app is a food app, I don’t want people to upload cars or sneakers or some nonsense. My idea was to add as many food pics as possible in as many categories as possible so that it well trained. When a photo is taken of the accuracy isn’t 80%-90% then it rejects it. 100 images would save me a ton of time. – Lance Samaria Dec 10 '18 at 18:31
  • I'm doing face detection and then recognition, i have 100 images of each person. check this https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5537777/ – canister_exister Dec 10 '18 at 18:36
  • 1
    https://www.vision.ee.ethz.ch/datasets_extra/food-101/ you can download image set from this website and train your model – canister_exister Dec 10 '18 at 18:37
  • The first link is going to take me a minute to read. The second link is basically the same exact thing I’m doing. I have 110 categories and I’m assuming there’s going to be over 50k in images. Looks like I can save some time using some of their images and training data. Thanks! I have another question if you don’t mind. When evaluating the model, should the test data also be food or jus random images? For eg I’m basically downloading everything under “pizza” on google images, there are 300 images that got returned. How can find 300 separate images that I don’t have to evaluate it against? – Lance Samaria Dec 10 '18 at 18:44
  • you should create unknown category to check when there is no food on image – canister_exister Dec 10 '18 at 18:47
  • How many images should I put inside of it? The food model is going to have 50k + images. Actually 150k with the link you sent – Lance Samaria Dec 10 '18 at 18:49
  • same amount of images as food categories – canister_exister Dec 10 '18 at 18:49
  • Sheesh, that means I have to find 150k random images of everything that’s not food. That’s going to be tough to do. – Lance Samaria Dec 10 '18 at 18:50
  • no. if you have 1000 images each category, so you need to find 1000 images for unknown category – canister_exister Dec 10 '18 at 18:53
  • Thanks for the advice and the links. If you can somehow summarize our convo into the answer, especially about the part where you says the evaluation images should match the sane number of images as the training data and to create an unknown category where there is no food I’ll accept it as the answer. I sure there are other people who will come across this problem. – Lance Samaria Dec 10 '18 at 18:55
  • If I have 5 categories (Chinese food, burgers, pizza, bbq, juices) each with 1000 images that means I need 5 unknown categories with 1000 images each? – Lance Samaria Dec 10 '18 at 18:57
  • i think u need 1 food category 1 unknown – canister_exister Dec 10 '18 at 20:07
  • Apple’s example has 1 model/folder named safari animals with subfolders of elephants, giraffes, lions and so on with pics of each animal in each subfolder. I’m doing the same thing except using food for the model/folder name with all the different types of food pizza, tacos, soup, soul food, and so on. Isn’t my food folder/model 1 main category? – Lance Samaria Dec 10 '18 at 20:22

0 Answers0