I need to implement a model similar to poitnet using GPU. I want to use tensorflow in python. There is an example of Pointnet with tensorflow and keras. However, trying even the simple example gives me a OOM error with lots of messages being printed (see error below). I tried reducing the number of points from 2500 to 1000 which didn´t work (https://keras.io/examples/vision/pointnet/). This is no solution since my real dataset is much bigger. I reduced the batch size to 1. I´m pretty sure that the gpu setup is right because tensorflow is recognizing it and basic models work properly. To me system: Windows 10 Intel Core i7-9700 (3Ghz/8 Cores) 32 GB RAM Nvidia RTX2080 (VRAM 8GB) Python 3.7.13 Cuda 11.6 cudnn 8.1 I am verry thankfull for any help!
Error message:
2022-09-13 08:49:14.385935: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 62 Chunks of size 512 totalling 31.0KiB
2022-09-13 08:49:14.386112: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 53 Chunks of size 1024 totalling 53.0KiB
2022-09-13 08:49:14.386289: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 1280 totalling 2.5KiB
2022-09-13 08:49:14.386463: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 1536 totalling 1.5KiB
2022-09-13 08:49:14.386640: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 49 Chunks of size 2048 totalling 98.0KiB
2022-09-13 08:49:14.386817: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 2304 totalling 2.2KiB
2022-09-13 08:49:14.386989: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 2560 totalling 2.5KiB
2022-09-13 08:49:14.387164: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 3328 totalling 3.2KiB
2022-09-13 08:49:14.387336: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 3584 totalling 3.5KiB
2022-09-13 08:49:14.387506: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 3840 totalling 7.5KiB
2022-09-13 08:49:14.387681: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 12 Chunks of size 4096 totalling 48.0KiB
2022-09-13 08:49:14.387859: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 6 Chunks of size 4608 totalling 27.0KiB
2022-09-13 08:49:14.388033: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 4864 totalling 4.8KiB
2022-09-13 08:49:14.388207: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 4 Chunks of size 5120 totalling 20.0KiB
2022-09-13 08:49:14.388389: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 6912 totalling 13.5KiB
2022-09-13 08:49:14.388563: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 7680 totalling 7.5KiB
2022-09-13 08:49:14.388739: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 8 Chunks of size 8192 totalling 64.0KiB
2022-09-13 08:49:14.388912: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 4 Chunks of size 8448 totalling 33.0KiB
2022-09-13 08:49:14.389088: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 11520 totalling 11.2KiB
2022-09-13 08:49:14.389264: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 22 Chunks of size 131072 totalling 2.75MiB
2022-09-13 08:49:14.389445: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 3 Chunks of size 160256 totalling 469.5KiB
2022-09-13 08:49:14.389624: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 260608 totalling 254.5KiB
2022-09-13 08:49:14.389803: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 261888 totalling 255.8KiB
2022-09-13 08:49:14.389983: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 320000 totalling 312.5KiB
2022-09-13 08:49:14.390162: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 4 Chunks of size 320512 totalling 1.22MiB
2022-09-13 08:49:14.390341: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 16 Chunks of size 524288 totalling 8.00MiB
2022-09-13 08:49:14.390517: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 2 Chunks of size 641024 totalling 1.22MiB
2022-09-13 08:49:14.390751: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 1282048 totalling 1.22MiB
2022-09-13 08:49:14.390930: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 2243584 totalling 2.14MiB
2022-09-13 08:49:14.391109: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 3 Chunks of size 3846144 totalling 11.00MiB
2022-09-13 08:49:14.391291: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 12804096 totalling 12.21MiB
2022-09-13 08:49:14.391473: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 15630336 totalling 14.91MiB
2022-09-13 08:49:14.391656: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 38412288 totalling 36.63MiB
2022-09-13 08:49:14.391840: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 11 Chunks of size 41025536 totalling 430.38MiB
2022-09-13 08:49:14.392023: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 3 Chunks of size 42307584 totalling 121.04MiB
2022-09-13 08:49:14.392209: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 55031040 totalling 52.48MiB
2022-09-13 08:49:14.392390: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 76824576 totalling 73.27MiB
2022-09-13 08:49:14.392568: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 7 Chunks of size 82051072 totalling 547.75MiB
2022-09-13 08:49:14.392750: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 83333120 totalling 79.47MiB
2022-09-13 08:49:14.392935: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 113449472 totalling 108.19MiB
2022-09-13 08:49:14.393122: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 155348992 totalling 148.15MiB
2022-09-13 08:49:14.393305: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 1 Chunks of size 401281024 totalling 382.69MiB
2022-09-13 08:49:14.393491: I tensorflow/core/common_runtime/bfc_allocator.cc:1091] 6 Chunks of size 656408576 totalling 3.67GiB
2022-09-13 08:49:14.393691: I tensorflow/core/common_runtime/bfc_allocator.cc:1095] Sum Total of in-use chunks: 5.66GiB
2022-09-13 08:49:14.393942: I tensorflow/core/common_runtime/bfc_allocator.cc:1097] total_region_allocated_bytes_: 6256918528 memory_limit_: 6256918528 available bytes: 0 curr_region_allocation_bytes_: 12513837056
2022-09-13 08:49:14.394248: I tensorflow/core/common_runtime/bfc_allocator.cc:1103] Stats:
Limit: 6256918528
InUse: 6073864704
MaxInUse: 6090222336
NumAllocs: 915
MaxAllocSize: 673185792
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2022-09-13 08:49:14.394828: W tensorflow/core/common_runtime/bfc_allocator.cc:491] **************************************************************************************************__
2022-09-13 08:49:14.395096: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at conv_ops.cc:684 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[313,512,1,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
ResourceExhaustedError: Graph execution error:
Detected at node 'pointnet/dense_14/ActivityRegularizer/Square' defined at (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\training.py", line 1409, in fit
tmp_logs = self.train_function(iterator)
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\training.py", line 1051, in train_function
return step_function(self, iterator)
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\training.py", line 1040, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\training.py", line 1030, in run_step
outputs = model.train_step(data)
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\training.py", line 889, in train_step
y_pred = self(x, training=True)
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\training.py", line 490, in __call__
return super().__call__(*args, **kwargs)
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\base_layer.py", line 1014, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
return fn(*args, **kwargs)
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\functional.py", line 459, in call
inputs, training=training, mask=mask)
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\functional.py", line 596, in _run_internal_graph
outputs = node.layer(*args, **kwargs)
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\base_layer.py", line 1017, in __call__
self._handle_activity_regularization(inputs, outputs)
File "C:\Users\weber-s\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\base_layer.py", line 2566, in _handle_activity_regularization
activity_loss = self._activity_regularizer(output)
File "<string>", line 11, in __call__
Node: 'pointnet/dense_14/ActivityRegularizer/Square'
failed to allocate memory
[[{{node pointnet/dense_14/ActivityRegularizer/Square}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[Op:__inference_train_function_9733]```