InternalError: Dst tensor is not initialized when doing KFold Cross Validation in TensorFlow

Question

I am trying to get mean absolute error (MAE) for each split of data using 5-fold (KFold) cross validation. I have built a custom model using Xception that takes a X-ray hand image as an input and outputs estimated age in months. When I run the for loop for kf.split(X_train) in the code below (Under cv_mae part), I get an output for the first CV run. However, after the first CV run, I get the following error:

640/640 [==============================] - 86s 114ms/step - loss: 0.3346 - mae_months: 17.8703

---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
Input In [15], in <cell line: 3>()
      3 for train_index, val_index in kf.split(X_train):
      4     model.fit(X_train[train_index], y_train[train_index], batch_size = 10)
----> 5     pred = model.predict(X_train[val_index], batch_size = 2)
      6     err = mean_absolute_error(y_train[val_index], pred)
      7     cv_mae.append(err)

File ~\anaconda3\lib\site-packages\keras\wrappers\scikit_learn.py:364, in KerasRegressor.predict(self, x, **kwargs)
    350 """Returns predictions for the given test data.
    351 
    352 Args:
   (...)
    361         Predictions.
    362 """
    363 kwargs = self.filter_sk_params(Sequential.predict, kwargs)
--> 364 return np.squeeze(self.model.predict(x, **kwargs))

File ~\anaconda3\lib\site-packages\keras\utils\traceback_utils.py:67, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     65 except Exception as e:  # pylint: disable=broad-except
     66   filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67   raise e.with_traceback(filtered_tb) from None
     68 finally:
     69   del filtered_tb

File ~\anaconda3\lib\site-packages\tensorflow\python\framework\constant_op.py:102, in convert_to_eager_tensor(value, ctx, dtype)
    100     dtype = dtypes.as_dtype(dtype).as_datatype_enum
    101 ctx.ensure_initialized()
--> 102 return ops.EagerTensor(value, ctx.device_name, dtype)

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

It seems the error appears every time when it comes across model.predict() because the error message states:

----> 5 pred = model.predict(X_train[val_index], batch_size = 2)

Code:

# Checking the GPU availability
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

#---------------------------------------------------------------------------------

# Root path for the image files:
root = 'P:/BoneDataset/0-Dataset/ba-trainset/'
age_df = pd.read_csv(os.path.join(root, 'ba-training-dataset.csv'))

# Converting 'male' column to have male and female instead of true and false:
age_df['gender'] = age_df['male'].map(lambda x: 'male' if x else 'female')

# Checking for the path existance
age_df['path'] = age_df['id'].map(lambda x: os.path.join(root, 'ba-trainset', '{}.png'.format(x)))
age_df['exists'] = age_df['path'].map(os.path.exists)
print(age_df['exists'].sum(), 'images found of total of', age_df.shape[0], 'images.')

#---------------------------------------------------------------------------------

# Oldest children age in the dataset:
print('Maximum age: ' + str(age_df['boneage'].max()) + ' months')

# Youngest children age in the dataset:
print('Minimum age: ' + str(age_df['boneage'].min()) + ' months')

# Mean of children age in the dataset: 
boneage_mean = age_df['boneage'].mean()
print('Mean BA: ' + str(boneage_mean))

# Median of children age in the dataset: 
print('Median BA: ' + str(age_df['boneage'].median()))

# Standard deviation of children age in the dataset: 
boneage_div = age_df['boneage'].std()

# Normalizing features (models perform better) to have Zero Mean, and 
# Unified Standard Deviation using Z-score for training:
age_df['boneage_zscore'] = age_df['boneage'].map(lambda x: (x-boneage_mean)/boneage_div)

#---------------------------------------------------------------------------------

# Trimming data size to 10000 from 12000
age_df['boneage_category'] = pd.cut(age_df['boneage'], 10)
new_age_df = age_df.groupby(['boneage_category', 'male']).apply(lambda x: x.sample(500, replace = True)).reset_index(drop = True)
print('New Data Size:', new_age_df.shape[0], 'Old Size:', age_df.shape[0])

#---------------------------------------------------------------------------------

train_df, valid_df = train_test_split(new_age_df, test_size = 0.20, stratify = new_age_df['boneage_category'])

#---------------------------------------------------------------------------------

## Image preprocessing:
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.vgg16 import preprocess_input
from keras.applications.imagenet_utils import preprocess_input

IMG_SIZE = (224, 224)
core_idg = ImageDataGenerator(samplewise_center = True, 
                              samplewise_std_normalization = True,
                              height_shift_range = 0.05,
                              width_shift_range = 0.05, 
                              rotation_range = 10, 
                              fill_mode = 'nearest',
                              rescale = 1. / 255,
                              preprocessing_function = preprocess_input)

#---------------------------------------------------------------------------------

def flow_from_dataframe(img_data_gen, in_df, path_col, y_col, **dflow_args):
    base_dir = os.path.dirname(in_df[path_col].values[0])
    print('## Ignore next message from keras, values are replaced anyways')
    df_gen = img_data_gen.flow_from_directory(base_dir, class_mode = 'sparse', **dflow_args)
    df_gen.filenames = in_df[path_col].values
    
    # Added df_gen.filepaths.extend because the filepaths is empty list. 
    # Hence added image path to the filepaths.
    df_gen.filepaths.extend(df_gen.filenames) 
    
    df_gen.classes = np.stack(in_df[y_col].values)
    df_gen.samples = in_df.shape[0]
    df_gen.n = in_df.shape[0]
    df_gen._set_index_array()
    df_gen.directory = '' # since we have the full path
    print('Reinserting dataframe: {} images'.format(in_df.shape[0]))
    return df_gen

#---------------------------------------------------------------------------------

# Data Generators:
train_gen = flow_from_dataframe(core_idg, train_df, 
                                path_col = 'path',
                                y_col = 'boneage_zscore', 
                                target_size = IMG_SIZE,
                                color_mode = 'rgb',
                                batch_size = len(train_df),
                                shuffle = True)

X_train, y_train = next(train_gen)

def boneage_model():
    
    base_model = Xception(input_shape = X_train.shape[1:], include_top = False, weights = 'imagenet')
    base_model.trainable = True

    model = Sequential()
    model.add(base_model)
    model.add(GlobalMaxPooling2D())
    model.add(Flatten())

    model.add(Dense(16, activation = 'relu'))
    model.add(Dense(1, activation = 'linear'))

    def mae_months(in_gt, in_pred):
        return mean_absolute_error(boneage_div * in_gt, boneage_div * in_pred) 

    # Compile model
    adam = Adam(learning_rate = 0.0005)
    model.compile(loss = 'mse', optimizer = adam, metrics = [mae_months])
    
    return model

#---------------------------------------------------------------------------------

# KFold
n_splits = 5
kf = KFold(n_splits = n_splits, shuffle = True, random_state = 42)

# create model
model = KerasRegressor(build_fn = boneage_model)

#---------------------------------------------------------------------------------
#### THIS IS WHERE THE ERROR STARTS 
cv_mae = []

for train_index, val_index in kf.split(X_train):
    model.fit(X_train[train_index], y_train[train_index], batch_size = 16)
    pred = model.predict(X_train[val_index], batch_size = 2)
    err = mean_absolute_error(y_train[val_index], pred)
    cv_mae.append(err)

cv_mae

Note! train_df has a value of 8000 images of X-ray hand images.

According to the post here, it suggests trimming the batch_size down, hence why the batch_size = 2 above at the model.predict(). However, it still prints out the same error message. Please help!

How much available memory does your GPU have between iterations? In my experience, you might need however much memory one iteration requires * the number of iterations because tensorflow doens't always free up memory. — Djinn, Aug 31 '22 at 17:14
So I am using the RTX 3060TI with 8GB RAM memory. I don't know how much memory my GPU have between iterations though.... — Bathtub, Aug 31 '22 at 19:13
Follow this https://stackoverflow.com/questions/34199233/how-to-prevent-tensorflow-from-allocating-the-totality-of-a-gpu-memory - tensorflow will then just use what's needed. — Djinn, Aug 31 '22 at 19:16
I already applied their method on my code before and still get the same error message as above-mentioned... Is it because after every split, it stores the data as memory that the memory ain't available for the next split? — Bathtub, Aug 31 '22 at 20:35
So how much available memory does your GPU have between iterations? — Djinn, Aug 31 '22 at 23:04

InternalError: Dst tensor is not initialized when doing KFold Cross Validation in TensorFlow

0 Answers0