I'm trying to accelerate a model I've built with keras and after some difficulty with cuda library versions I've managed to get tensorflow to detect my GPU. However now when I run the model with my GPU detected, it fails with the following traceback:
2021-01-20 17:40:26.549946: W tensorflow/core/common_runtime/bfc_allocator.cc:441] ****___*********____________________________________________________________________________________
Traceback (most recent call last):
File "model.py", line 72, in <module>
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=2, validation_data=(x_val, y_val))
File "/home/muke/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "/home/muke/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "/home/muke/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 888, in _call
return self._stateless_fn(*args, **kwds)
File "/home/muke/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2942, in __call__
return graph_function._call_flat(
File "/home/muke/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1918, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/home/muke/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 555, in call
outputs = execute.execute(
File "/home/muke/.local/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: SameWorkerRecvDone unable to allocate output tensor. Key: /job:localhost/replica:0/task:0/device:CPU:0;ccc21c10a2feabe0;/job:localhost/replica:0/task:0/device:GPU:0;edge_17_IteratorGetNext;0:0
[[{{node IteratorGetNext/_2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[Op:__inference_train_function_875]
Function call stack:
train_function
The model runs fine on just CPU.
I'm unsure if this is related to the versioning or not, but to be sure I'll detail the situation. I'm running gentoo but due to how heavy the tensorflow package is to compile I've downloaded a binary package through pip, which has version 2.4.0
. I've installed the lastest nvidia-cuda-toolkit
package as well as cudnn
through my distro's package manager, but when I do this and test to see if tensorflow detects my GPU, it says it can't find libcusolver.so.10
, when instead I have libcusolver.so.11
through the latest version. I tried to downgrade to a version of the cuda toolkit which had libcusolver.so.10
, but then tensorflow would complain about not being able to find several other version 11 libraries, so I've installed the latest cuda toolkit package but included in the /opt/cuda/lib64
directory the older libcusolver.so.10
files as well. I understand this is a hacky solution but I'm not sure what else I can do if that's what it's looking for.
Here's my full model code using keras:
model = Sequential()
model.add(Conv2D(8, (7,7), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(16, (7,7), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(num_classes, activation='softmax'))
model.summary()
batch_size = 1000
epochs = 100
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adam(learning_rate=0.001), metrics=['accuracy'])
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=2, validation_data=(x_val, y_val))