15

I have noticed that tf.nn.softmax_cross_entropy_with_logits_v2(labels, logits) mainly performs 3 operations:

  1. Apply softmax to the logits (y_hat) in order to normalize them: y_hat_softmax = softmax(y_hat).

  2. Compute the cross-entropy loss: y_cross = y_true * tf.log(y_hat_softmax)

  3. Sum over different class for an instance: -tf.reduce_sum(y_cross, reduction_indices=[1])

The code borrowed from here demonstrates this perfectly.

y_true = tf.convert_to_tensor(np.array([[0.0, 1.0, 0.0],[0.0, 0.0, 1.0]]))
y_hat = tf.convert_to_tensor(np.array([[0.5, 1.5, 0.1],[2.2, 1.3, 1.7]]))

# first step
y_hat_softmax = tf.nn.softmax(y_hat)

# second step
y_cross = y_true * tf.log(y_hat_softmax)

# third step
result = - tf.reduce_sum(y_cross, 1)

# use tf.nn.softmax_cross_entropy_with_logits_v2
result_tf = tf.nn.softmax_cross_entropy_with_logits_v2(labels = y_true, logits = y_hat)

with tf.Session() as sess:
    sess.run(result)
    sess.run(result_tf)
    print('y_hat_softmax:\n{0}\n'.format(y_hat_softmax.eval()))
    print('y_true: \n{0}\n'.format(y_true.eval()))
    print('y_cross: \n{0}\n'.format(y_cross.eval()))
    print('result: \n{0}\n'.format(result.eval()))
    print('result_tf: \n{0}'.format(result_tf.eval()))

Output:

y_hat_softmax:
[[0.227863   0.61939586 0.15274114]
[0.49674623 0.20196195 0.30129182]]

y_true: 
[[0. 1. 0.]
[0. 0. 1.]]

y_cross: 
[[-0.         -0.4790107  -0.        ]
[-0.         -0.         -1.19967598]]

result: 
[0.4790107  1.19967598]

result_tf: 
[0.4790107  1.19967598]

However, the one hot labels includes either 0 or 1, thus the cross entropy for such binary case is formulated as follows shown in here and here:

binary_cross_entropy

I write code for this formula in the next cell, the result of which is different from above. My question is which one is better or right? Does tensorflow has function to compute the cross entropy according to this formula also?

y_true = np.array([[0.0, 1.0, 0.0], [0.0, 0.0, 1.0]])
y_hat_softmax_from_tf = np.array([[0.227863, 0.61939586, 0.15274114], 
                              [0.49674623, 0.20196195, 0.30129182]])
comb = np.dstack((y_true, y_hat_softmax_from_tf))
#print(comb)

print('y_hat_softmax_from_tf: \n{0}\n'.format(y_hat_softmax_from_tf))
print('y_true: \n{0}\n'.format(y_true))

def cross_entropy_fn(sample):
    output = []
    for label in sample:
        if label[0]:
            y_cross_1 = label[0] * np.log(label[1])
        else:
            y_cross_1 = (1 - label[0]) * np.log(1 - label[1])
        output.append(y_cross_1)
    return output

y_cross_1 = np.array([cross_entropy_fn(sample) for sample in comb])
print('y_cross_1: \n{0}\n'.format(y_cross_1))

result_1 = - np.sum(y_cross_1, 1)
print('result_1: \n{0}'.format(result_1))

output

y_hat_softmax_from_tf: 
[[0.227863   0.61939586 0.15274114]
[0.49674623 0.20196195 0.30129182]]

y_true: 
[[0. 1. 0.]
[0. 0. 1.]]

y_cross_1: 
[[-0.25859328 -0.4790107  -0.16574901]
[-0.68666072 -0.225599   -1.19967598]]

result_1: 
[0.90335299 2.11193571]
Maxim
  • 52,561
  • 27
  • 155
  • 209
lifang
  • 1,485
  • 3
  • 16
  • 23
  • Be careful in the official documentation: WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results. It seems that y should not be passed to a softmax function. – Munichong May 23 '18 at 15:08
  • 1
    What is the difference of this V2 to the previous one? Can i just replace the code with new V2? I got a deprecated message as i run the tf 1.9 code for tf.nn.softmax_cross_entropy_with_logits(...) – Yingding Wang Aug 22 '18 at 14:39

1 Answers1

6

Your formula is correct, but it works only for binary classification. The demo code in tensorflow classifies 3 classes. It's like comparing apples to oranges. One of the answers you refer to mentions it too:

This formulation is often used for a network with one output predicting two classes (usually positive class membership for 1 and negative for 0 output). In that case i may only have one value - you can lose the sum over i.

The difference between these two formulas (binary cross-entropy vs multinomial cross-entropy) and when each one is applicable is well-described in this question.

The answer to your second question is yes, there is such a function called tf.nn.sigmoid_cross_entropy_with_logits. See the above mentioned question.

Maxim
  • 52,561
  • 27
  • 155
  • 209