Just curious, on how long will it take to train the VGG16 model on IMAGENET using GOOGLE COLAB TPU? If someone can explain me the calcuations they did to get to the answer, that would be great!
2 Answers
It's very hard to accurately estimate the how long it'll take to train a model e2e. But assuming you're just looking for a very rough estimate, we can start off by noting that this ResNet50 implementation we have (code) runs to convergence (76%+ top1 accuracy trained on 90 epochs) in roughly 7.3 hours on a v2-8 TPU device. Given that VGG16 are close enough in step time (https://github.com/jcjohnson/cnn-benchmarks#cnn-benchmarks) I'd expect convergence for it to also be proportional to that. However, disclaimer that this is a very rough estimation and actual performance would also depend on how optimized the implementation is.

- 871
- 6
- 9
Here is the official TPU example. Training VGG-16
on optimized tfrecord dataset with 2990 train images, IMAGE_SIZE = [331, 331], batch_size=128, 12 epochs
takes 2m15sec. I think training with 1,281,167 ImageNet images
will takes approximately 15 hours
.

- 3,624
- 1
- 21
- 35

- 788
- 5
- 15