0

I need to implement a model similar to poitnet using GPU. I want to use tensorflow in python. There is an example of Pointnet with tensorflow and keras. However, trying even the simple example gives me a OOM error with lots of messages being printed (see error below). I tried reducing the number of points from 2500 to 1000 which didn´t work (https://keras.io/examples/vision/pointnet/). This is no solution since my real dataset is much bigger. I reduced the batch size to 1. I´m pretty sure that the gpu setup is right because tensorflow is recognizing it and basic models work properly. To me system: Windows 10 Intel Core i7-9700 (3Ghz/8 Cores) 32 GB RAM Nvidia RTX2080 (VRAM 8GB) Python 3.7.13 Cuda 11.6 cudnn 8.1 I am verry thankfull for any help!

Error message:

2022-09-13 08:49:14.385935: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 62 Chunks of size 512 totalling 31.0KiB
2022-09-13 08:49:14.386112: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 53 Chunks of size 1024 totalling 53.0KiB
2022-09-13 08:49:14.386289: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 1280 totalling 2.5KiB
2022-09-13 08:49:14.386463: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 1536 totalling 1.5KiB
2022-09-13 08:49:14.386640: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 49 Chunks of size 2048 totalling 98.0KiB
2022-09-13 08:49:14.386817: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 2304 totalling 2.2KiB
2022-09-13 08:49:14.386989: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 2560 totalling 2.5KiB
2022-09-13 08:49:14.387164: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 3328 totalling 3.2KiB
2022-09-13 08:49:14.387336: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 3584 totalling 3.5KiB
2022-09-13 08:49:14.387506: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 3840 totalling 7.5KiB
2022-09-13 08:49:14.387681: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 12 Chunks of size 4096 totalling 48.0KiB
2022-09-13 08:49:14.387859: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 6 Chunks of size 4608 totalling 27.0KiB
2022-09-13 08:49:14.388033: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 4864 totalling 4.8KiB
2022-09-13 08:49:14.388207: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 4 Chunks of size 5120 totalling 20.0KiB
2022-09-13 08:49:14.388389: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 6912 totalling 13.5KiB
2022-09-13 08:49:14.388563: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 7680 totalling 7.5KiB
2022-09-13 08:49:14.388739: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 8 Chunks of size 8192 totalling 64.0KiB
2022-09-13 08:49:14.388912: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 4 Chunks of size 8448 totalling 33.0KiB
2022-09-13 08:49:14.389088: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 11520 totalling 11.2KiB
2022-09-13 08:49:14.389264: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 22 Chunks of size 131072 totalling 2.75MiB
2022-09-13 08:49:14.389445: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 3 Chunks of size 160256 totalling 469.5KiB
2022-09-13 08:49:14.389624: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 260608 totalling 254.5KiB
2022-09-13 08:49:14.389803: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 261888 totalling 255.8KiB
2022-09-13 08:49:14.389983: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 320000 totalling 312.5KiB
2022-09-13 08:49:14.390162: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 4 Chunks of size 320512 totalling 1.22MiB
2022-09-13 08:49:14.390341: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 16 Chunks of size 524288 totalling 8.00MiB
2022-09-13 08:49:14.390517: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 641024 totalling 1.22MiB
2022-09-13 08:49:14.390751: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 1282048 totalling 1.22MiB
2022-09-13 08:49:14.390930: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 2243584 totalling 2.14MiB
2022-09-13 08:49:14.391109: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 3 Chunks of size 3846144 totalling 11.00MiB
2022-09-13 08:49:14.391291: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 12804096 totalling 12.21MiB
2022-09-13 08:49:14.391473: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 15630336 totalling 14.91MiB
2022-09-13 08:49:14.391656: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 38412288 totalling 36.63MiB
2022-09-13 08:49:14.391840: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 11 Chunks of size 41025536 totalling 430.38MiB
2022-09-13 08:49:14.392023: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 3 Chunks of size 42307584 totalling 121.04MiB
2022-09-13 08:49:14.392209: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 55031040 totalling 52.48MiB
2022-09-13 08:49:14.392390: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 76824576 totalling 73.27MiB
2022-09-13 08:49:14.392568: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 7 Chunks of size 82051072 totalling 547.75MiB
2022-09-13 08:49:14.392750: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 83333120 totalling 79.47MiB
2022-09-13 08:49:14.392935: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 113449472 totalling 108.19MiB
2022-09-13 08:49:14.393122: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 155348992 totalling 148.15MiB
2022-09-13 08:49:14.393305: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 401281024 totalling 382.69MiB
2022-09-13 08:49:14.393491: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 6 Chunks of size 656408576 totalling 3.67GiB
2022-09-13 08:49:14.393691: I tensorflow/core/common_runtime/bfc_allocator.cc:1095] Sum Total of in-use chunks: 5.66GiB
2022-09-13 08:49:14.393942: I tensorflow/core/common_runtime/bfc_allocator.cc:1097] total_region_allocated_bytes_: 6256918528 memory_limit_: 6256918528 available bytes: 0 curr_region_allocation_bytes_: 12513837056
2022-09-13 08:49:14.394248: I tensorflow/core/common_runtime/bfc_allocator.cc:1103] Stats: 
Limit:                      6256918528
InUse:                      6073864704
MaxInUse:                   6090222336
NumAllocs:                         915
MaxAllocSize:                673185792
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2022-09-13 08:49:14.394828: W tensorflow/core/common_runtime/bfc_allocator.cc:491] **************************************************************************************************__
2022-09-13 08:49:14.395096: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at conv_ops.cc:684 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[313,512,1,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
ResourceExhaustedError: Graph execution error:

Detected at node 'pointnet/dense_14/ActivityRegularizer/Square' defined at (most recent call last):
    File "<string>", line 1, in <module>
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\training.py", line 1409, in fit
      tmp_logs = self.train_function(iterator)
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\training.py", line 1051, in train_function
      return step_function(self, iterator)
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\training.py", line 1040, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\training.py", line 1030, in run_step
      outputs = model.train_step(data)
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\training.py", line 889, in train_step
      y_pred = self(x, training=True)
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\training.py", line 490, in __call__
      return super().__call__(*args, **kwargs)
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\base_layer.py", line 1014, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
      return fn(*args, **kwargs)
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\functional.py", line 459, in call
      inputs, training=training, mask=mask)
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\functional.py", line 596, in _run_internal_graph
      outputs = node.layer(*args, **kwargs)
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\base_layer.py", line 1017, in __call__
      self._handle_activity_regularization(inputs, outputs)
    File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\base_layer.py", line 2566, in _handle_activity_regularization
      activity_loss = self._activity_regularizer(output)
    File "<string>", line 11, in __call__
Node: 'pointnet/dense_14/ActivityRegularizer/Square'
failed to allocate memory
     [[{{node pointnet/dense_14/ActivityRegularizer/Square}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
 [Op:__inference_train_function_9733]```
SvenW
  • 3
  • 2
  • What is the batch size, did you try reducing it? – Frightera Sep 17 '22 at 16:39
  • I tried reducing the batch size, eve, the extreme of only one sample per batch didn´t help – SvenW Sep 17 '22 at 17:18
  • Have you tried one of the answers here (more so answers 2 or 3): https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory – Djinn Sep 17 '22 at 19:26
  • @Djinn I tried both, from the tensorflow message I see the setting was successful. However, still get OOM error – SvenW Sep 19 '22 at 07:57

0 Answers0