I want to train a large object detection model using TF2, preferrably the EfficientDet D7
network. With my Tesla P100 card that has 16 GB of memory I am running into an "out of memory" exception, i.e. not enough memory on the graphics card can be allocated.
So I am wondering what my options are in this case. Is it correct that if I would have multiple GPUs, then the TF model would be split so that it fills memory of both cards? So in my case, with a second Tesla card again with 16 GB I would have 32 GB in total during training? If that is the case would that also be true for a cloud provider, where I could utilize multiple GPUs?
Moreover, if I am wrong and it would not work to split a model for multiple GPUs during training, what other approach would work in order to train a large network that does not fit into my GPU memory?
PS: I know that I could reduce the batch_size
to 1, but unfortunately that does still not solve my issue for the really large models ...