I have been using Zhixuhao's implementation of U-Net to try to do semantic binary segmentation and I modified it slightly using suggestions from this Stackoverflow answer:
Keras, binary segmentation, add weight to loss function
to be able to do a pixel-wise weighted binary cross-entropy, as they do in the original U-Net paper (see page 5), to force my U-Net to learn border pixels. Essentially the idea is to add a lambda layer that computes the pixel-wise weighted cross-entropy within the model itself and then use an "identity loss" that just copies the output of the network.
Here is what my input data looks like:
input image groundtruth weights
And here is what my code looks like:
def unet(pretrained_weights = None,input_size = (256,256,1)):
inputs = Input(input_size)
# [... Unet architecture from Zhixuhao's model.py file...]
conv10 = Conv2D(1, 1, activation = 'sigmoid', name='true_output')(conv9)
mask_weights = Input(input_size)
true_masks = Input(input_size)
loss1 = Lambda(weighted_binary_loss, output_shape=input_size, name='loss_output')([conv10, mask_weights, true_masks])
model = Model(inputs = [inputs, mask_weights, true_masks], outputs = loss1)
model.compile(optimizer = Adam(lr = 1e-4), loss =identity_loss)
And added those two functions:
def weighted_binary_loss(X):
y_pred, weights, y_true = X
loss = keras.losses.binary_crossentropy(y_pred, y_true)
loss = multiply([loss, weights])
return loss
def identity_loss(y_true, y_pred):
return y_pred
And finally here is the relevant part of my main.py:
input_size = (256,256,1)
target_size = (256,256)
myGene = trainGenerator(5,'data/moma/train','img','seg','wei',data_gen_args,save_to_dir=None,target_size=target_size)
model = unet(input_size=input_size)
model_checkpoint = ModelCheckpoint('unet_moma_weights.hdf5',monitor='loss',verbose=1, save_best_only=True)
model.fit_generator(myGene,steps_per_epoch=300,epochs=5,callbacks=[model_checkpoint])
Now this code runs fine, I can train my U-Net and it does learn border pixels, but only if I resize my input images to be 256*256 in size. If I instead use input_size=(256,32,1) and target_size=(256,32) in main.py , which is the relevant dimensions for my data and that allows me to use bigger batch sizes, I get the following error:
ValueError: Operands could not be broadcast together with shapes (256, 32, 1) (256, 32)
For the line loss = multiply([loss, weights])
. And indeed the weights have one extra singleton dimension. I don't understand why the error is not raised when I use 256*256 inputs, but I tried to make both inputs the same dimensions with either k.expand_dims() or Reshape(), but while the code does not issue an error and the loss converges, when I test my network on extra inputs I get blank outputs (ie fully grey or white or black images, or stuff that has nothing to do with my inputs).
So this is a lot of text for the following question: Why does multiply() issue an error in the 256*32 case and not 256*256, and why creating/removing dimensions on the inputs does not help?
Thanks!
ps: In order to get the network to output the actual prediction instead of the pixel-wise loss after training, I remove the loss layer and the two extra input layers with the following code:
new_model = Model(inputs=model.inputs,outputs=model.get_layer("true_output").output)
new_model.compile(optimizer = Adam(lr = 1e-4), loss = 'binary_crossentropy')
new_model.set_weights(model.get_weights())
This works fine (again in the 256*256 case at least)