I want to get mean absolute error (MAE) for each split of data using 5-fold cross validation. I have built a custom model using Xception.
Hence, to try this, I coded the following:
# Data Generators:
train_gen = flow_from_dataframe(core_idg, train_df,
path_col = 'path',
y_col = 'boneage_zscore',
target_size = IMG_SIZE,
color_mode = 'rgb',
batch_size = 32,
shuffle = True)
X_train, Y_train = next(train_gen)
#-----------------------------------------------------------------------
# Custom Model initiation:
base_model = Xception(input_shape = X_train.shape[1:], include_top = False, weights = 'imagenet')
base_model.trainable = True
model = Sequential()
model.add(base_model)
model.add(GlobalMaxPooling2D())
model.add(Flatten())
model.add(Dense(16, activation = 'relu'))
model.add(Dense(1, activation = 'linear'))
def mae_months(in_gt, in_pred):
return mean_absolute_error(boneage_div * in_gt, boneage_div * in_pred)
# Compile model
adam = Adam(learning_rate = 0.0005)
model.compile(loss = 'mse', optimizer = adam, metrics = [mae_months])
#-----------------------------------------------------------------------
# KFold
n_splits = 5
kf = KFold(n_splits = n_splits, shuffle = True, random_state = 42)
I coded up to KFold, but now I am stuck with proceeding to the cross validation step to get MAE for each data splits?
A post here suggests a for loop for each Kfold splits, but that's only if the model such as DecisionTreeRegressor() is used instead of a custom model using Xception like mine?
UPDATE
After referring to the suggestion below, I applied the code as follows after the using KFold:
# Data Generators:
train_gen = flow_from_dataframe(core_idg, train_df,
path_col = 'path',
y_col = 'boneage_zscore',
target_size = IMG_SIZE,
color_mode = 'rgb',
batch_size = 1024,
shuffle = True)
...
...
...
mae_list = []
n_splits = 5
kf = KFold(n_splits = n_splits, shuffle = True, random_state = 42)
split = kf.split(X_train, Y_train) # X_train, Y_train = next(train_gen) from above
for train, test in split:
x_train, x_test, y_train, y_test = X_train[train], X_train[test], Y_train[train], Y_train[test]
history = model.fit(x_train, y_train, validation_data = (x_test, y_test), batch_size = 16)
pred = model.predict(x_test, batch_size = 8)
err = mean_absolute_error(y_test, pred)
mae_list .append(err)
I set the batch size of train_gen
to like 1024 first then run the code above, however, I get the following error:
52/52 [==============================] - 16s 200ms/step - loss: 0.9926 - mae_months: 31.5353 - val_loss: 4.4153 - val_mae_months: 81.5463
52/52 [==============================] - 9s 172ms/step - loss: 0.4185 - mae_months: 21.4242 - val_loss: 0.7401 - val_mae_months: 29.3815
52/52 [==============================] - 9s 172ms/step - loss: 0.2930 - mae_months: 17.3729 - val_loss: 0.5628 - val_mae_months: 23.9055
9/52 [====>.........................] - ETA: 7s - loss: 0.2355 - mae_months: 16.7444
ResourceExhaustedError Traceback (most recent call last)
Input In [11], in <cell line: 9>()
10 x_train, x_test, y_train, y_test = X_train[train], X_train[test], Y_train[train], Y_train[test]
11 # model = boneage_model()
12 # history = model.fit(train_gen, validation_data = (x_test, y_test))
---> 13 history = model.fit(x_train, y_train, validation_data = (x_test, y_test), batch_size = 16)
14 pred = model.predict(x_test, batch_size = 8)
15 err = mean_absolute_error(y_test, pred)
ResourceExhaustedError: Graph execution error:
....
....
....
Node: 'gradient_tape/sequential/xception/block14_sepconv2/separable_conv2d/Conv2DBackpropFilter'
OOM when allocating tensor with shape[2048,1536,1,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node gradient_tape/sequential/xception/block14_sepconv2/separable_conv2d/Conv2DBackpropFilter}}]]
The memory allocation looks like this from the prompt (hopefully this makes sense):
total_region_allocated_bytes_: 5769199616
memory_limit_: 5769199616
available bytes: 0
curr_region_allocation_bytes_: 8589934592
Stats:
Limit: 5769199616
InUse: 5762760448
MaxInUse: 5769190400
NumAllocs: 192519
MaxAllocSize: 2470510592
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
Is it because my GPU cannot take the batch_size?
UPDATE 2
I have decreased the batch_size
of the train_gen
to 32. Took out the batch_size
from the fit()
and predict()
method. Is this the right way to determine the MAE for each data split?
Code:
# Data Generators:
train_gen = flow_from_dataframe(core_idg, train_df,
path_col = 'path',
y_col = 'boneage_zscore',
target_size = IMG_SIZE,
color_mode = 'rgb',
batch_size = 32,
shuffle = True)
X_train, Y_train = next(train_gen)
...
...
...
mae_list = []
n_splits = 5
kf = KFold(n_splits = n_splits, shuffle = True, random_state = 42)
split = kf.split(X_train, Y_train) # X_train, Y_train = next(train_gen) from above
for train, test in split:
x_train, x_test, y_train, y_test = X_train[train], X_train[test], Y_train[train], Y_train[test]
history = model.fit(x_train, y_train, validation_data = (x_test, y_test))
pred = model.predict(x_test)
err = mean_absolute_error(y_test, pred)
mae_list.append(err)
UPDATE 3
According to the suggestions from the comments:
- Edited the
batch_size
of thetrain_gen
to 64. - Added
valid_gen
to useX_valid
andy_valid
as validation data of thefit()
method. - Used
x_test
for thepredict()
method. - Added a method for limiting GPU memory growth.
Code:
# Checking the GPU availability
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)
...
...
...
# Data Generators:
train_gen = flow_from_dataframe(core_idg, train_df,
path_col = 'path',
y_col = 'boneage_zscore',
target_size = IMG_SIZE,
color_mode = 'rgb',
batch_size = 64,
shuffle = True)
X_train, Y_train = next(train_gen)
valid_gen = flow_from_dataframe(core_valid, valid_df,
path_col = 'path',
y_col = 'boneage_zscore',
target_size = IMG_SIZE,
color_mode = 'rgb',
batch_size = 64,
shuffle = True)
X_valid, y_valid = next(valid_gen)
# Getting MAE for each data split using 5-fold (KFold)
cv_mae = []
n_splits = 5
kf = KFold(n_splits = n_splits, shuffle = True, random_state = 42)
split = kf.split(X_train, Y_train)
for train, test in split:
x_train, x_test, y_train, y_test = X_train[train], X_train[test], Y_train[train], Y_train[test]
history = model.fit(x_train, y_train, validation_data = (X_valid, y_valid))
pred = model.predict(x_test)
err = mean_absolute_error(y_test, pred)
cv_mae.append(err)
cv_mae
The output:
2/2 [==============================] - 8s 2s/step - loss: 3.6179 - mae_months: 66.8136 - val_loss: 2.1544 - val_mae_months: 47.2171
2/2 [==============================] - 1s 394ms/step - loss: 1.0826 - mae_months: 36.3370 - val_loss: 1.6431 - val_mae_months: 40.9770
2/2 [==============================] - 1s 344ms/step - loss: 0.6129 - mae_months: 23.0258 - val_loss: 1.8911 - val_mae_months: 45.6456
2/2 [==============================] - 1s 360ms/step - loss: 0.4500 - mae_months: 22.6450 - val_loss: 1.3592 - val_mae_months: 36.7073
2/2 [==============================] - 1s 1s/step - loss: 0.4222 - mae_months: 20.2543 - val_loss: 1.1010 - val_mae_months: 32.8488
[<tf.Tensor: shape=(13,), dtype=float32, numpy=
array([1.4442804, 1.3981661, 1.5037801, 2.2199252, 1.7645894, 1.4836203,
1.7916738, 1.3967942, 1.4069557, 2.516875 , 1.4077926, 1.4342965,
1.9279695], dtype=float32)>,
<tf.Tensor: shape=(13,), dtype=float32, numpy=
array([1.8153722, 1.9236553, 1.3917867, 1.5313213, 1.387209 , 1.3831038,
1.4519565, 1.4680854, 1.7810788, 2.5733376, 1.4269204, 1.3751 ,
1.446231 ], dtype=float32)>,
<tf.Tensor: shape=(13,), dtype=float32, numpy=
array([1.6616 , 1.6529323, 1.9181525, 2.536807 , 1.6306267, 2.856683 ,
2.113724 , 1.5543866, 1.9128528, 3.218016 , 1.4112593, 1.4043481,
3.229338 ], dtype=float32)>,
<tf.Tensor: shape=(13,), dtype=float32, numpy=
array([2.1295295, 1.8527019, 1.9779519, 3.1390932, 1.5525225, 2.0811615,
1.6279813, 1.87973 , 1.5029857, 1.6502519, 2.3677726, 1.8570358,
1.7251074], dtype=float32)>,
<tf.Tensor: shape=(12,), dtype=float32, numpy=
array([1.3926607, 1.7088655, 1.7379242, 3.5756006, 1.5988973, 1.3926607,
1.4928951, 1.4665956, 1.3926607, 1.4575896, 3.146022 , 1.3926607],
dtype=float32)>]
Does this mean that I have MAEs for 5 data splits? (where it says numpy = array[....]
in the output?)