I have a pipeline program that runs three inference processes in one go. However, the third process will hit error below.
RuntimeError: Error in virtual void*
faiss::gpu::StandardGpuResourcesImpl::allocMemory(const
faiss::gpu::AllocRequest&) at /__w/faiss-wheels/faiss-
wheels/faiss/faiss/gpu/StandardGpuResources.cpp:452: Error: 'err ==
cudaSuccess' failed: StandardGpuResources: alloc fail type
TemporaryMemoryBuffer dev 0 space Device stream 0x1e9eb170 size
1073741824 bytes (cudaMalloc error out of memory [2])
I am using a RTX 3070 with 8GB of VRAM. For more detail, the first two processes are inference with pretrained model and third process is a similarity search using FAISS. I hit error when I tried to move my search index from CPU to GPU in third process. I need to run the search on GPU as my index size is in the scale of million.
Understand that Tensorflow will allocate the entire GPU memory during a process call. I tried the approach of using set_memory_growth
at the beginning of program but it still does not work.
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
except RuntimeError as e:
print(e)
There are answers that suggested using per_process_gpu_memory_fraction but this is no longer available in TF 2.5.
I tracked the memory usage using tf.config.experimental.get_memory_info('GPU:0')
and below is the log.
Beginning of the program - {'current': 0, 'peak': 0}
After first inference - {'current': 281843712, 'peak': 2803776768}
After second inference - {'current': 281844480, 'peak': 2803776768}
As the first two inference processes are not using the full memory, is there anyway I can free up the allocated memory for my third process? Or prevent TF 2.5 from allocating the entire memory.