Possible solution
IMO you should use almost standard categorical_crossentropy
and output logits from the network which will be mapped in loss function to values [0,1,2,3,4]
using argmax
operation (same procedure will be applied to one-hot-encoded
labels, see last part of this answer for an example).
Using weighted crossentropy
you can treat incorrectness differently based on the predicted vs correct
values as you said you indicated in the comments.
All you have to do is to take absolute value of subtracted correct and predicted value and multiply it by loss, see example below:
Let's map each encoding to it's unary value (can be done using argmax
as later seen):
[0, 0, 0, 0] -> 0
[1, 0, 0, 0] -> 1
[1, 1, 0, 0] -> 2
[1, 1, 1, 0] -> 3
[1, 1, 1, 1] -> 4
And let's make some random targets and predictions by the model to see the essence:
correct predicted with Softmax
0 0 4
1 4 3
2 3 3
3 1 4
4 3 1
5 1 0
Now, when you subtract correct
and predicted
and take absolute you essentially get weighting column like this:
weights
0 4
1 1
2 0
3 3
4 2
5 1
As you can see, prediction of 0
while true target is 4
will be weighted 4 times more than prediction of 3
with the same 4
target and that is what you want essentially IIUC.
As Daniel Möller indicates in his answer I would advise you to create a custom loss function as well but a little simpler:
import tensorflow as tf
# Output logits from your network, not the values after softmax activation
def weighted_crossentropy(labels, logits):
return tf.losses.softmax_cross_entropy(
labels,
logits,
weights=tf.abs(tf.argmax(logits, axis=1) - tf.argmax(labels, axis=1)),
)
And you should use this loss in your model.compile
as well, I think there is no need to reiterate points already made.
Disadvantages of this solution:
- For correct predictions gradient will be equal to zero, which means it will be harder for network to strengthen connections (maximize/minimize logits towards
+inf/-inf
)
- Above can be mitigated by adding random noise (additional regularization) to each weighted loss. Would act as a regularization as well, might help.
- Better solution might be to exclude from weighting case where predictions are equal (or make it 1), it would not add randomization to network optimization.
Advantages of this solution:
- You can easily add weighting for imbalanced dataset (e.g. certain classes ocuring more often)
- Maps cleanly to existing API
- Simple conceptually and remains in classification realm
- Your model cannot predict nonexistent classification values, e.g. with your multitarget case it could predict
[1, 0, 1, 0]
, there is no such with approach above. Less degree of freedom would help it train and remove chances for nonsensical (if I got your problem description right) predictions.
Additional discussion provided in the chat room in comments
Example network with custom loss
Here is an example network with the custom loss function defined above.
Your labels have to be one-hot-encoded
in order for it to work correctly.
import keras
import numpy as np
import tensorflow as tf
# You could actually make it a lambda function as well
def weighted_crossentropy(labels, logits):
return tf.losses.softmax_cross_entropy(
labels,
logits,
weights=tf.abs(tf.argmax(logits, axis=1) - tf.argmax(labels, axis=1)),
)
model = keras.models.Sequential(
[
keras.layers.Dense(32, input_shape=(10,)),
keras.layers.Activation("relu"),
keras.layers.Dense(10),
keras.layers.Activation("relu"),
keras.layers.Dense(5),
]
)
data = np.random.random((32, 10))
labels = keras.utils.to_categorical(np.random.randint(5, size=(32, 1)))
model.compile(optimizer="rmsprop", loss=weighted_crossentropy)
model.fit(data, labels, batch_size=32)