I am running a model to detect a few interesting features in an image. I have a set of images measuring 600x200 px. These images have features such as rock fragments that I would like to identify. Imagine a (4x12) grid overlayed on the image I can produce annotations manually using an annotator tool such as ((4,9), (3,10), (3,11), (3,12))
to identify the interesting cells in the image. I can build a CNN model with Keras with the input as a grayscale image. But how should I encode the output. One way that seems intuitive to me is to treat it as a sparse matrix of shape (12,4,1)
and only the interesting cells have 1 while others have 0.
- Is there a better way to encode the outputs?
- What should be the activation function on the last layer be? I am using ReLU for the hidden layers.
- What should the loss function be? Will
mean_squared_error
work?