I am training a model built with TF. At the first epoch, TF is slower than the next epochs by a factor of *100 and I am seeing messages like:
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 958 to 1053
As suggested here, I tried to use tcmalloc by setting LD_PRELOAD="/usr/lib/libtcmalloc.so"
, but it didn't help.
Any idea on how to make the first epoch run faster?