keras - per-pixel, unnormalized, softmax loss for semantic segmentation

Question

I'm troubleshooting a Keras/TensorFlow U-Net for semantic segmentation. One thing I keep coming across are claims like this....

"Now the problem is using the softmax in your case as Keras don't support softmax on each pixel."

which is stated here (as well as lots of other places): Cross Entropy Loss for Semantic Segmentation Keras

They typical solution is to unroll rows and cols into 1d so the output (4d) tensor of shape (batch, rows, cols, n_classes) becomes a 3d tensor (batch, rows*cols, n_classes) and apply dense softmax to that.

But this dummy example leads me to think that a Conv2d layer with 1x1 kernel and 'softmax' activation DOES perform per-pixel softmax.

Am I wrong?

Example:

import numpy as np
np.random.seed(345)

from keras.layers import Input
from keras.models import Model 
from keras.layers.convolutional import Conv2D
from keras.initializers import RandomUniform

mimick input to per-pixel softmax classifier in this case a 3x3 field with 5 filters

np.random.seed(234)
dummy_input = np.random.random(45).astype('float32')
dummy_input = dummy_input.reshape((1,3,3,5))

build 2 networks with the same random weights (one linear, one softmax)

conv_input = Input(shape=(3,3,5,), dtype='float32')

pred_layer_softmax = Conv2D(filters=2, kernel_size=(1,1), strides=(1,1), kernel_initializer=RandomUniform(minval=0., maxval=1.), 
                            padding='valid', data_format='channels_last', activation='softmax')(conv_input)

pred_layer_linear = Conv2D(filters=2, kernel_size=(1,1), strides=(1,1),
                           padding='valid', data_format='channels_last', activation='linear')(conv_input)

models

m_softmax = Model(conv_input, pred_layer_softmax)
m_linear = Model(conv_input, pred_layer_linear)

# keep weights the same for both networks
m_linear.set_weights(m_softmax.get_weights())

predictions

pred     = m_softmax.predict(dummy_input)
pred_lin = m_linear.predict(dummy_input)

Do the two networks make the same predictions? yes.

print('\nsoftmax pred')
print(pred.argmax(axis=3))
print('\nlinear_pred')
print(pred_lin.argmax(axis=3))

softmax pred
[[[1 1 0]
  [1 1 0]
  [1 0 1]]]

linear_pred
[[[1 1 0]
  [1 1 0]
  [1 0 1]]]

Do the sum of per-pixel class probabilities add up to 1.0 for softmax? Yes.

print('\nsoftmax - sum class probs')
print(pred.sum(axis=3))
print('\nlinear - sum class probs')
print(pred_lin.sum(axis=3))

softmax - sum class probs
[[[ 1.  1.  1.]
  [ 1.  1.  1.]
  [ 1.  1.  1.]]]

linear - sum class probs
[[[ 1.88952112  2.50639653  2.06084657]
  [ 1.81122136  2.21819067  2.01038122]
  [ 1.92753291  1.85993922  2.27295876]]]

What am I missing? This looks like per-pixel softmax is working fine, right?

Is there some fundamental thing I'm not understanding?

Thanks in advance.

honestly, my network doesn't converge right now, so I'm trying lots of things. At the moment my loss function is Keras's sparse_categorical_crossentropy(from_logits=False). — kmh, Feb 06 '18 at 20:43
also... I think for 2 classes (background vs. foreground, which is the current task), binary_crossentropy on a sigmoid activation should work too... but I'd like to keep it generalizable to more classes, hence my attempt at crossentropy on softmax instead. — kmh, Feb 06 '18 at 20:48
From my perspective - everything seems fine. You could try to overfit it on a small subset of data. — Marcin Możejko, Feb 06 '18 at 21:02

keras - per-pixel, unnormalized, softmax loss for semantic segmentation

0 Answers0