13

Can someone help me understand a bit better this problem? I must train a neural network which should output 200 mutually independent categories, each of these categories is a percentage ranging from 0 to 1. This seems to me like a binary_crossentropy problem, but every example I see on the internet uses binary_crossentropy with a single output. Since my output should be 200, if I apply binary_crossentropy, would that be correct?

This is what I have in mind, is that a correct approach or should I change it?

inputs = Input(shape=(input_shape,))
hidden = Dense(2048, activation='relu')(inputs)
hidden = Dense(2048, activation='relu')(hidden)
output = Dense(200, name='output_cat', activation='sigmoid')(hidden)
model = Model(inputs=inputs, outputs=[output])
loss_map = {'output_cat': 'binary_crossentropy'}
model.compile(loss=loss_map, optimizer="sgd", metrics=['mae', 'accuracy'])
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
RaduS
  • 2,465
  • 9
  • 44
  • 65
  • I think your approach is okay. You can search for multi-label examples instead of the binary classification ones. – Yu-Yang Oct 28 '17 at 12:50

5 Answers5

14

To optimize for multiple independent binary classification problems (and not multiple category problem where you can use categorical_crossentropy) using Keras, you could do the following (here I take the example of 2 independent binary outputs, but you can extend that as much as needed):

    inputs = Input(shape=(input_shape,))
    hidden = Dense(2048, activation='relu')(inputs)
    hidden = Dense(2048, activation='relu')(hidden)
    output = Dense(units = 2, activation='sigmoid')(hidden )

here you split your output using Keras's Lambda layer:

    output_1 = Lambda(lambda x: x[...,:1])(output)
    output_2 = Lambda(lambda x: x[...,1:])(output)

    adad = optimizers.Adadelta()

your model output becomes a list of the different independent outputs

    model = Model(inputs, [output_1, output_2])

you compile the model using one loss function for each output, in a list. (In fact, if you give only one kind of loss function, I believe it will apply it to all the outputs independently)

    model.compile(optimizer=adad, loss=['binary_crossentropy','binary_crossentropy'])
deepit
  • 141
  • 3
12

I know this is an old question, but I believe the accepted answer is incorrect and the most upvoted answer is workable but not optimal. The original poster's method is the correct way to solve this problem. His output is 200 independent probabilities from 0 to 1, so his output layer should be a dense layer with 200 neurons and a sigmoid activation layer. It's not a categorical_crossentropy problem because it's not 200 mutually exclusive categories. Also, there's no reason to split the output using a lambda layer when a single dense layer will do. The original poster's method is correct. Here's another way to do it using the Keras interface.

model = Sequential()
model.add(Dense(2048, input_dim=n_input, activation='relu'))
model.add(Dense(2048, input_dim=n_input, activation='relu'))
model.add(Dense(200, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
Troy D
  • 2,093
  • 1
  • 14
  • 28
  • This the correct answer, if `'binary_crossentropy'` in Keras does indeed work like you say. ( I do agree that the accepted answer is wrong, as well as all the others saying to use categorical_crossentropy.) Sadly I do not find anything in the documentation about the effect of `tf.keras.losses.BinaryCrossentropy` on multiple outputs. Do you have a reference somewhere? – Lu Kas Oct 14 '22 at 12:09
0

binary_crossentropy with Sigmoid activation function is used for binary (positive and negative) classification, whereas your case is multi-class classification. In the case of multi-class classification, categorical_crossentropy with softmax activation is used. The Sigmoid activation function generates the probability of input being positive class, and SoftMax generates probability corresponding to input being in each class. The class with maximum probability is assigned to the input.

-2

When there are multiple classes, categorical_crossentropy should be used. Refer to another answer here.

pyan
  • 3,577
  • 4
  • 23
  • 36
  • 1
    And how do i return percentage with `categorical_crossentropy`? There are 200 classes and each can have a percentage between 0 and 1? These classes are not exclusive of each other, meaning that i need all 200 – RaduS Oct 28 '17 at 06:52
-2

For multiple category classification problems, you should use categorical_crossentropy rather than binary_crossentropy. With this, when your model classifies an input, it is going give a dispersion of probabilities between all 200 categories. The category that receives the highest probability will be the output for that particular input.

You can see this when you call model.predict(). If you were to call this function only on one input, for example, and print the results, you will see a result of 200 percentages (in total summing to 1). The hope is that one of those 200 percentages would be vastly higher than the others, which signals that the model thinks that there is a strong probability that this is the correct output (category) for this particular input.

This video may help clarify the prediction piece. Printing out the predictions starts around 3:17, but to get the full context, you'll need to start from the beginning.

blackHoleDetector
  • 2,975
  • 2
  • 13
  • 13
  • 6
    What i am looking for is 200 categories and each category with a percentage between 0 and 1. Not a total sum of 1 for all 200. Would `categorical_crossentropy` help then? – RaduS Oct 28 '17 at 17:39