How does Keras set the dimensions in this network which has CNN and dense layers?

Question

I need some help to understand what's going on here.

My goal is to have a network that receives sizeXsize images and returns sizeXsize binary matrices. The output of the network should be a binary sizeXsize matrix that indicates if a pixel has a feature or not.

For example, think of a corner detection network where the output layer tells if a pixel is exactly a tip of the corner. Namely, we want to detect only the pixel of this corner:

The first layers in the networks are defined as follows:

from keras import models, layers
import numpy as np

size=5

input_image = layers.Input(shape=(size, size, 1))

b = layers.Conv2D(5, (3,3), activation='relu', padding='same')(input_image)
b = layers.MaxPooling2D((2,2), strides=1,  padding='same')(b)
b = layers.Conv2D(5, (3,3), activation='relu', padding='same')(b)
b_out = layers.MaxPooling2D((2,2),strides=1 ,padding='same')(b)

Until now I maintained the dimensions of the original input layer (sizeXsize).

Now I would like to have a dense layer as an output layer with sizeXsize pixels.

If I use output = layers.Dense(size, activation='sigmoid')(b_out) the layer built is sizeXsizeXsize, and if I do output = layers.Dense(1, activation='sigmoid')(b_out) the size is sizeXsize, how comes?!

This is the building and the compilation part of the code:

model = models.Model(input_image, output)
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

What do I miss here? Isn't output = layers.Dense(1, activation='sigmoid')(b_out) just a single neuron?

The thing is that if I train:

n_images=100
data = np.random.randint(0,2,(n_images,size,size,1))
labels = np.random.randint(0,2,(n_images,size,size,1))
labels = data

model.fit(data, labels, verbose=1, batch_size=4, epochs=20)

and if I test it:

data1 = np.random.randint(0,2,(n_images,size,size,1))
score, acc = model.evaluate(data1,data1, verbose=1)


print('Test score:', score)
print('Test accuracy:', acc)

a=np.random.randint(0,2,(1,size,size,1))
prediction = model.predict(a)

print(a==np.round(prediction))

I get a good accuracy, and it seems the sizes are correct for the output layer:

100/100 [==============================] - 0s 349us/step
Test score: 0.187119951248
Test accuracy: 0.926799981594
[[[[ True]
   [ True]
   [ True]
   [ True]
   [ True]]

  [[ True]
   [ True]
   [ True]
   [ True]
   [ True]]

  [[ True]
   [ True]
   [ True]
   [ True]
   [ True]]

  [[ True]
   [ True]
   [ True]
   [ True]
   [ True]]

  [[ True]
   [ True]
   [ True]
   [ True]
   [ True]]]]

If I read Dense documentation:

units: Positive integer, dimensionality of the output space.

So how comes if I put layers.Dense(1, activation='sigmoid')(b_out) I get an output layer of sizeXsize?

"which indicates whether the feature was detected or not in each pixel" If you would like to scan pixel-by-pixel, I think you should adjust your kernel sizes `layers.Conv2D(5, **(3,3)**` or you will include multiple pixels in each stride. — from keras import michael, Oct 05 '18 at 03:16
@fromkerasimportmichael so what would you suggest to do exactly? I have an image where each pixel is either has a feature or not. The information if a pixel has this feature depends on information from its surrounding pixels. — 0x90, Oct 05 '18 at 04:05
Oh, sorry. I read "which indicates whether the feature was detected or not in each pixel" as meaning that each pixel was independent of the surrounding pixels. — from keras import michael, Oct 05 '18 at 04:09
@fromkerasimportmichael, a toy example of what I am trying to do is to find the pixel of the corner in images. The input will be `sizeXsize` image and the output will be a `sizeXsize` binary image — 0x90, Oct 05 '18 at 04:54

score 1 · Answer 1 · edited Oct 05 '18 at 05:13

1

The trick is not to use the conventional Dense layer, but use a convolutional layer with kernel size (1,1), i.e. you need something like below:

b = layers.Conv2D(5, (3,3), activation='relu', padding='same')(input_image)
b = layers.MaxPooling2D((2,2), strides=1,  padding='same')(b)
b = layers.Conv2D(5, (3,3), activation='relu', padding='same')(b)
b = layers.MaxPooling2D((2,2),strides=1 ,padding='same')(b)
# not use Dense, but Conv2D
binary_out = layers.Conv2D(1, (1,1), activation='sigmoid', padding='same')(b)

edited Oct 05 '18 at 05:13

0x90

39,472
36
165
245

answered Oct 05 '18 at 04:54

pitfall

2,531
1
21
21

could you explain why dense doesn't work? How comes the output of my network is `sizeXsize` at the end. As you can see from my `score, acc = model.evaluate(data1,data1, verbose=1)` line? Although your trick is very nice, using it will not connect information from long range pixels only locals. – 0x90 Oct 05 '18 at 05:20
Because `Dense` expects a 2D input and produces a 2D output, while you expect a 4D output. The use of `Conv2D` with kernel size (1,1) is completely equivalent to doing the follows: 1) reshape your 4D feature tensor into 2D, whose new dimension is `(batch_size x nb_rows x nb_cols, nb_feats)`; 2) apply a `Dense` layer whose input dimension is `nb_feats` and output dimension is 1. – pitfall Oct 05 '18 at 06:54
@user36624 That's not true. Current implementation of `Dense` layer is such that it accepts n-dimensional input and produces n-dimensional outputs and `n` could be any positive integer. – today Oct 05 '18 at 09:34
Both `Dense(1)` and `Conv2D(1,(1,1))` has 129 parameters. `dense_1 (Dense) (None, 2000, 2000, 1) 129` and `conv2d_3 (Conv2D) (None, 2000, 2000, 1) 129` – 0x90 Oct 05 '18 at 15:47
@0x90 That's because they are essentially the same thing. – today Oct 05 '18 at 19:34

today · Answer 2 · 2018-10-05T12:03:29.820

Your confusion stems from the fact that Dense layer is currently implemented such that it is applied on the last axis of input data. That's why when you feed the output of MaxPooling layer (i.e. b_out), which has a shape of (size, size, 5), to a Dense layer with one unit you get an output of shape (size, size, 1). In this case, the single neuron in the Dense layer is connected to each of 5 elements in the output array, though with the same weights (that's why if you take a look at the summary() output, you would see that the Dense layer has 6 parameters, 5 weights plus one bias parameter).

You can either use a Dense layer (with one unit) or a Conv2D layer (with one filter) as the last layer. If you ask which one works better the answer is It depends on the specific problem you are working on and the data you have. However, you can take some ideas from image segmentation networks where first the image is processed with a combination of Conv2D and MaxPooling2D layers (and its dimension is reduced as we go forward in the model) and then some upsample layers and Conv2D layers are used to get back the image with the same size as input image. Here is a sketch (though, you don't need to use TimeDistributed and LSTM layers for your case).

How does Keras set the dimensions in this network which has CNN and dense layers?

2 Answers2