16

I'm pretty sure this is a silly question but I can't find it anywhere else so I'm going to ask it here.

I'm doing semantic image segmentation using a cnn (unet) in keras with 7 labels. So my label for each image is (7,n_rows,n_cols) using the theano backend. So across the 7 layers for each pixel, it's one-hot encoded. In this case, is the correct error function to use categorical cross-entropy? It seems that way to me but the network seems to learn better with binary cross-entropy loss. Can someone shed some light on why that would be and what the principled objective is?

mrk
  • 8,059
  • 3
  • 56
  • 78
TSW
  • 661
  • 2
  • 7
  • 11

2 Answers2

16

Binary cross-entropy loss should be used with sigmod activation in the last layer and it severely penalizes opposite predictions. It does not take into account that the output is a one-hot coded and the sum of the predictions should be 1. But as mis-predictions are severely penalizing the model somewhat learns to classify properly.

Now to enforce the prior of one-hot code is to use softmax activation with categorical cross-entropy. This is what you should use.

Now the problem is using the softmax in your case as Keras don't support softmax on each pixel.

The easiest way to go about it is permute the dimensions to (n_rows,n_cols,7) using Permute layer and then reshape it to (n_rows*n_cols,7) using Reshape layer. Then you can added the softmax activation layer and use crossentopy loss. The data should also be reshaped accordingly.

The other way of doing so will be to implement depth-softmax :

def depth_softmax(matrix):
    sigmoid = lambda x: 1 / (1 + K.exp(-x))
    sigmoided_matrix = sigmoid(matrix)
    softmax_matrix = sigmoided_matrix / K.sum(sigmoided_matrix, axis=0)
    return softmax_matrix

and use it as a lambda layer:

model.add(Deconvolution2D(7, 1, 1, border_mode='same', output_shape=(7,n_rows,n_cols)))
model.add(Permute(2,3,1))
model.add(BatchNormalization())
model.add(Lambda(depth_softmax))

If tf image_dim_ordering is used then you can do way with the Permute layers.

For more reference check here.

indraforyou
  • 8,969
  • 3
  • 49
  • 40
  • Thank you for your very detailed answer! I went with the reshape, softmax and categorical cross-entropy. Would you expect there are any substantial performance differences between the two methods in terms of speed or final accuracy? Thanks again! – TSW Feb 09 '17 at 17:23
  • I haven't worked on this scenario myself but you can check both of them. Also another thing that you can try is first create a model with final layer as `sigmoid` and binary cross-entropy loss and once training is done replace the top layer and and end with `softmax` and retrain with categorical cross-entropy. The 2nd training will converge quickly but I would bet overall the training time will reduce and will have better accuracy. – indraforyou Feb 09 '17 at 22:13
  • Hi indraforyou, I am also working on a semantic segmentation case. The masked image is represented as (1,n_rows, n_cols). For this case, can I use sigmoid and binary cross-entrophy? Are there any specific procedures to be included? – user297850 Feb 12 '17 at 16:55
  • @user297850 for single channel you don't need to do anything special. You can simply use sigmoid and binary cross-entrophy. – indraforyou Feb 12 '17 at 19:06
  • @indraforyou I am kind of confused by your function. `axis=0` means that you take the sum along the rows? I would assume that it would be `axis=-1` as you want to sum along the depth (however, substituting `axis=0` for `axis=1` does not work, it will throw an error). Also, does this really result in the softmax? I would assume it to be `lambda x: K.exp(x)` instead... – AljoSt Nov 08 '18 at 13:32
0

I tested the solution of @indraforyou and think that the proposed method has some mistakes. As the commentsection does not allow for proper code segments, here is what I think would be the fixed version:

def depth_softmax(matrix):

    from keras import backend as K

    exp_matrix = K.exp(matrix)
    softmax_matrix = exp_matrix / K.expand_dims(K.sum(exp_matrix, axis=-1), axis=-1)
    return softmax_matrix

This method will expect the ordering of the matrix to be (height, width, channels).

AljoSt
  • 439
  • 4
  • 19