I have an unbalanced dataset that I use to train my Neural Network. The number of labels in the dataset corresponding to the classes I've counted is below:
training_data_classnums = [1480, 1104, 1261, 942, 2612, 1109, 355, 355]
I use Softmax to calculate the probabilities of each class in the loss function:
training_data_classnums = [1480, 1104, 1261, 942, 2612, 1109, 355, 355]
class_sum = sum(training_data_classnums)
training_data_weights = [x / class_sum for x in training_data_classnums]
print("class_weights {}".format(training_data_weights))
...
onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=8)
class_weight = tf.constant(training_data_weights)
weighted_logits = tf.multiply(logits, class_weight)
tf.logging.info("logits_weighted {} {}".format(weighted_logits.get_shape(), logits.get_shape()))
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=weighted_logits)
Is this the correct way to calculate the weights for each class, and is this the right way to apply it into the loss calculation? (I use the tf.Estimator
)