This is a three part question
1) Class size - i'm training the TF object detection API on 5 classes, where sizes aren't anywhere close to each other:
- No. of images in class1: 401
- No. of images in class2: 389
- No. of images in class3: 532
- No. of images in class4: 159393
- No. of images in class5: 185313
(total
This isn't training a typical image classifier so I'm guessing this isn't really an issue of class imbalance, but im wondering if it would affect the outcome model
2) Can TF object detection API be used to detect two objects where 1 is enclosed / bounded by the other?
Ex. face vs person - face is within the bounds of the person
3) This is a continuation where I found that using Faster RCNN means batch_size has to be set to 1.
And because of this, I am not sure if this means that I have to wait for global step during training to match the # of images in the training set (approx 340k in my custom data set). I am using Tesla k80 GPU w/12 GB memory on Google compute w/4 vCPU and 15gig RAM. After about 2 days, i see loss hitting well below 1 though:
INFO:tensorflow:global step 264250: loss = 0.2799 (0.755 sec/step)
INFO:tensorflow:global step 264251: loss = 0.0271 (0.787 sec/step)
INFO:tensorflow:global step 264252: loss = 0.1122 (0.677 sec/step)
INFO:tensorflow:global step 264253: loss = 0.1709 (0.797 sec/step)
INFO:tensorflow:global step 264254: loss = 0.8366 (0.790 sec/step)
INFO:tensorflow:global step 264255: loss = 0.0541 (0.741 sec/step)
INFO:tensorflow:global step 264256: loss = 0.0760 (0.781 sec/step)
INFO:tensorflow:global step 264257: loss = 0.0621 (0.777 sec/step)
How do I determine when to stop? I noticed even until here, the frozen inference graph that I generate from the latest checkpoint file ONLY seems to detect the class w/ the most number of images (i.e. face) and doesn't detect anything else.