Tensorflow 2.5 limit GPU memory usage

Question

I have a pipeline program that runs three inference processes in one go. However, the third process will hit error below.

RuntimeError: Error in virtual void* 
faiss::gpu::StandardGpuResourcesImpl::allocMemory(const 
faiss::gpu::AllocRequest&) at /__w/faiss-wheels/faiss-
wheels/faiss/faiss/gpu/StandardGpuResources.cpp:452: Error: 'err == 
cudaSuccess' failed: StandardGpuResources: alloc fail type 
TemporaryMemoryBuffer dev 0 space Device stream 0x1e9eb170 size 
1073741824 bytes (cudaMalloc error out of memory [2])

I am using a RTX 3070 with 8GB of VRAM. For more detail, the first two processes are inference with pretrained model and third process is a similarity search using FAISS. I hit error when I tried to move my search index from CPU to GPU in third process. I need to run the search on GPU as my index size is in the scale of million.

Understand that Tensorflow will allocate the entire GPU memory during a process call. I tried the approach of using set_memory_growth at the beginning of program but it still does not work.

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
  except RuntimeError as e:
    print(e)

There are answers that suggested using per_process_gpu_memory_fraction but this is no longer available in TF 2.5.

I tracked the memory usage using tf.config.experimental.get_memory_info('GPU:0') and below is the log.

Beginning of the program - {'current': 0, 'peak': 0}

After first inference - {'current': 281843712, 'peak': 2803776768}

After second inference - {'current': 281844480, 'peak': 2803776768}

As the first two inference processes are not using the full memory, is there anyway I can free up the allocated memory for my third process? Or prevent TF 2.5 from allocating the entire memory.

score 0 · Answer 1 · answered Aug 24 '21 at 03:53

0

Batch size has a major impact on the amount of GPU memory required for any particular model. You can try reducing your batch size to see if that helps.

answered Aug 24 '21 at 03:53

Dennis Sosnoski

346
1
6

Tensorflow 2.5 limit GPU memory usage

1 Answers1