What should the output layer of my CNN look like?

Question

I am running a model to detect a few interesting features in an image. I have a set of images measuring 600x200 px. These images have features such as rock fragments that I would like to identify. Imagine a (4x12) grid overlayed on the image I can produce annotations manually using an annotator tool such as ((4,9), (3,10), (3,11), (3,12)) to identify the interesting cells in the image. I can build a CNN model with Keras with the input as a grayscale image. But how should I encode the output. One way that seems intuitive to me is to treat it as a sparse matrix of shape (12,4,1) and only the interesting cells have 1 while others have 0.

Is there a better way to encode the outputs?
What should be the activation function on the last layer be? I am using ReLU for the hidden layers.
What should the loss function be? Will mean_squared_error work?

why dont you split up your image and classify every gridfield seperatly with a "smaller network (4*12)" — Florian H, Jul 07 '17 at 12:44
These images are flattened from a circular borehole. So I am interested in sinusoidal patterns to identify interesting geological features as opposed to other rock features that look similar but are not sinusoidal. — Vagmi Mudumbai, Jul 07 '17 at 12:55

score 2 · Answer 1 · answered Jul 07 '17 at 12:21

Your problem is really similiar to both detection and segmentation problems (you can read about it e.g. here. The approach you proposed is reasonable because in both detection and segmentation tasks computing the feature map you proposed is an usual part of training pipeline. However - there are several problem you might come across:

memory issues: you need to either deal with sparse tensors or use generators in order to deal with memory problems,
loss and activation: loss and activation for segmentation are currently not supported by Keras API so you need to implement it on your own. Here and here you can find an examples on how to tackle this problem.

In case of detection only (not classification of this points) I would advice you to use sigmoid and binary_crossentropy. In case of classification softmax and categorical_crossentropy.

Of course - there are other ways on how to tackle this problem. One could solve it as a regression where you need to predict the pixels where there is something interesting. But dealing with varying input in Keras is rather cumbersome.

What should the output layer of my CNN look like?

1 Answers1