How long will it take to train the VGG-16 model on IMAGENET using GOOGLE COLAB TPU?

Question

Just curious, on how long will it take to train the VGG16 model on IMAGENET using GOOGLE COLAB TPU? If someone can explain me the calcuations they did to get to the answer, that would be great!

score 5 · Answer 1 · answered Jun 09 '20 at 20:29

It's very hard to accurately estimate the how long it'll take to train a model e2e. But assuming you're just looking for a very rough estimate, we can start off by noting that this ResNet50 implementation we have (code) runs to convergence (76%+ top1 accuracy trained on 90 epochs) in roughly 7.3 hours on a v2-8 TPU device. Given that VGG16 are close enough in step time (https://github.com/jcjohnson/cnn-benchmarks#cnn-benchmarks) I'd expect convergence for it to also be proportional to that. However, disclaimer that this is a very rough estimation and actual performance would also depend on how optimized the implementation is.

score 3 · Answer 2 · edited Feb 01 '21 at 21:34

3

Here is the official TPU example. Training VGG-16 on optimized tfrecord dataset with 2990 train images, IMAGE_SIZE = [331, 331], batch_size=128, 12 epochs takes 2m15sec. I think training with 1,281,167 ImageNet images will takes approximately 15 hours.

edited Feb 01 '21 at 21:34

Jules G.M.

3,624
1
21
35

answered Jan 04 '21 at 21:07

balezz

788
5
15

How long will it take to train the VGG-16 model on IMAGENET using GOOGLE COLAB TPU?

2 Answers2