I am working on 3D image segmentation with a convolutional neural network in Keras 2.1.1 with tensorflow as backend. I am using the fit_generator function because 3D images are very memory consuming and I am applying heavy data augmentation before each update.
Edit: I also referenced this post on github and added a small demo which demonstrates the problematic behavior: https://github.com/keras-team/keras/issues/8837
import sys
import numpy as np
import h5py
import random
import tensorflow as tf
import random
import scipy.misc as misc
import datetime
import time
import cv2
from keras.callbacks import EarlyStopping
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam
print("Loading net")
ada = Adam(lr=0.00005, beta_1=0.1, beta_2=0.001, epsilon=1e-08, decay=0.0)
net = Net(input_shape=(128,160,144, 4),outputChannel=5,momentum=0.5)
net.compile(optimizer="Adam",loss=jaccard_distance_loss)
print("Finished loading net")
filename = 'fold0_1.hdf5'
f = h5py.File(filename, 'r')
train_gen = generateData(f[u'train_x'],f[u'train_y'],augmentor=random_geometric_transformation)
val_gen = generateData(f[u'valid_x'],f[u'valid_y'])
filepath="Model-{epoch:02d}.h5"
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]
#Train Model
net.fit_generator(generator = train_gen,
steps_per_epoch = 200,
validation_data=val_gen,
validation_steps = 37,
epochs = 800,
callbacks=callbacks_list)
The problem is that although the error reported during training is relatively low, the error when predicting on a training or a test image is comparable to the minimally trained state (The first value is the overall loss, and the other values are the respective loss for each class. Note that the loss for the first class is close to 0 in an untrained model since it is just predicting background):
#Loss before training
[3.9978203773498535, 0.032198667526245117, 0.99983119964599609, 0.99984711408615112, 0.99907118082046509, 0.96687209606170654]
After training the network on any given (train/test) sample very briefly, the evaluation error will match the reported error during training:
Epoch 1/5
1/1 [==============================] - 9s 9s/step - loss: 2.0542 - slice_layer_1_loss: 0.0048 - slice_layer_2_loss: 0.9998 - slice_layer_3_loss: 0.3026 - slice_layer_4_loss: 0.6302 - slice_layer_5_loss: 0.1167
Epoch 2/5
1/1 [==============================] - 1s 592ms/step - loss: 2.0278 - slice_layer_1_loss: 0.0045 - slice_layer_2_loss: 0.9998 - slice_layer_3_loss: 0.2916 - slice_layer_4_loss: 0.6191 - slice_layer_5_loss: 0.1128
Epoch 3/5
1/1 [==============================] - 1s 582ms/step - loss: 2.0066 - slice_layer_1_loss: 0.0043 - slice_layer_2_loss: 0.9998 - slice_layer_3_loss: 0.2888 - slice_layer_4_loss: 0.6066 - slice_layer_5_loss: 0.1071
Epoch 4/5
1/1 [==============================] - 1s 590ms/step - loss: 1.9909 - slice_layer_1_loss: 0.0042 - slice_layer_2_loss: 0.9998 - slice_layer_3_loss: 0.2872 - slice_layer_4_loss: 0.5959 - slice_layer_5_loss: 0.1038
Epoch 5/5
1/1 [==============================] - 1s 572ms/step - loss: 1.9787 - slice_layer_1_loss: 0.0041 - slice_layer_2_loss: 0.9998 - slice_layer_3_loss: 0.2855 - slice_layer_4_loss: 0.5875 - slice_layer_5_loss: 0.1019
#Loss after training
1/1 [==============================] - 0s 190ms/step
[2.1015677452087402, 0.0048453211784362793, 0.99983119964599609, 0.33013522624969482, 0.64043641090393066, 0.1263195276260376]
The problem seems very similar to this post: ResNet: 100% accuracy during training, but 33% prediction accuracy with the same data
However, the model used in this demo is already trained for 200 epochs over the full dataset and a higher number of epochs does not solve the problem. Furthermore, the reported error on the validation set during training is not decreasing by any means, but predicting on test images shows the same behavior as described above.
Is it possible, that there is some kind of problem when using batch normalization with the fit_generator when the batch size is only 1?
PS: Here the code for the generator
import numpy as np
import random
def reverseArgMax(array,n):
newArray = np.empty(list(array.shape+(n,)))
for i in range(n):
temp = array.copy()
if i==1:
temp[temp!=1] = 0
elif i==0:
temp[temp==1] = 2
temp[temp==i] = 1
temp[temp!=1] = 0
else:
temp[temp==1] = 0
temp[temp==i] = 1
temp[temp!=1] = 0
newArray[...,i]=temp
return newArray
def generateData(data,labels,augmentor=None,batch_size=1):
#Generates batches of samples
while 1:
# Generate batches
imax = list(range(len(data)))
np.random.shuffle(imax)
for i in imax:
x = np.array(data[i])
y = np.array((labels[i]))
x = np.transpose(x,axes=[0,2,3,1])
x = np.array([x])
y = np.array([y])
x = x.astype(np.float32)
y = y.astype(np.float32)
y=reverseArgMax(y,5)
#Augment
if augmentor!=None:
x,y = augmentor(x,y)
x = x.astype(np.float32)
y = y.astype(np.float32)
y = np.transpose(y,[4,0,1,2,3])
y = np.reshape(y,[5,1,-1])
yield x, list(y)