I am trying to run my Keras code as model parallelism. I have been looking into the net about it and I found guidance from Tensorflow, or here but they didn't work. And I get this error all the time:
2022-08-02 09:43:51.638045: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-02 09:43:51.861293: F tensorflow/core/platform/statusor.cc:33] Attempting to fetch value instead of handling error INTERNAL: failed initializing StreamExecutor for CUDA device ordinal 0: INTERNAL: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY: out of memory; total memory reported: 42505273344
Aborted (core dumped)
Could somebody explain to me what is the problem, and how I have to solve it?
Thank you
More explanation: I want to divide my model and each GPU runs a part of the model to avoid the memory problem.
The code:
def model():
input_low = tf.keras.layers.Input((None,None) + (3,))
input_med = tf.keras.layers.Input((None,None) + (3,))
input_high = tf.keras.layers.Input((None,None) + (3,))
#segmentation stage
seg_low = seg(input_low,'seg_low')
expand_seg_low = expanding_layer(seg_low)
seg_med = seg(input_low,'seg_med')
expand_seg_med = expanding_layer(seg_med)
seg_high = seg(input_high,'seg_high')
expand_seg_high = expanding_layer(seg_high)
inps = [input_low,input_high,input_med]
segs = [expand_seg_low,expand_seg_med,expand_seg_high]
final_out = refinement(inps,segs)
model = tf.keras.Model(inputs=[input_low,input_med,input_high],outputs=[seg_low,seg_high,final_out])
model.compile(optimizer=keras.optimizers.Adam(learning_rate = 0.001),
loss=['binary_crossentropy',
'binary_crossentropy',
'mse'],
metrics=['accuracy'])
# tf.keras.utils.plot_model(model,to_file="model.png",show_shapes=True)
return(model)