Should I use softmax or tf.nn.sigmoid_cross_entropy_with_logits to generate a multi class classification with a probability per class?

Question

I'm reading about NN, and would also like to generate my first NN at the same time (to complement my reading).

I have a data set like this:

DNA_seq  Sample1Name  Sample1Name  ConcOfDNAInSample  DNASeqFoundInProcessCat

AGGAG     cat_0     cat_1    0.1   found_in_0  
AGGAG     cat_1     cat_2    0.4   found_in_3
ACCCC     cat_1     cat_7    0.1   found_in_2
AGAGAGA   cat_2     cat_10   1.9   found_in_1
ADAS      cat_332   cat_103  8.9   found_in_1

Columns:

DNASeq -> a string of a DNA sequence (i.e. 'the sequences')

Sample1Name -> categorical value explaining a chemical property of the solution that DNASeq is in.

Sample2Name -> categorical value explaining a chemical property of the solution that DNASeq is in.

ConcOfDNAInSample -> a quantitative value of DNA concentration in Sample2SName.

DNASeqFoundInProcessCat -> This is the label that I want to predict. It is a categorical value with four categories (found_in_0 -> found_in_3). This is the output from where I did three tests on each DNASeq to see if I manipulate the original solution (which is the found_in_0), is the DNASeq still present.

My question: For an unseen set of sequences, I want the output set of labels to be a multi-class probability of 'found_in_1', 'found_in_2', 'found_in_3'.

i.e. if the above example was the output from my test set, my output would ideally look like this:

DNA_seq  Sample1Name  Sample1Name  ConcOfDNAInSample  DNASeqFoundInProcessCat

AGGAG     cat_0     cat_1    0.1   (0.9,0.5,0.1)  
AGGAG     cat_1     cat_2    0.4   (0.8,0.7,0.3)
ACCCC     cat_1     cat_7    0.1   (0.2,0.5,0.3)
AGAGAGA   cat_2     cat_10   1.9   (0.7,0.2,0.9)
ADAS      cat_332   cat_103  8.9   (0.6,0.8,0.7)

There are some notes:

It is possible that because of the processes I am doing, that some sequences can NOT be in the original solution (found_in_0), but then because bits of DNA can stick together, they CAN subsequently be in the other classes (found_in_1, found_in_2, found_in_3)
I am only interested in the output for the found_in_1, found_in_2 and found_in_3 class (i.e. I want a three class probability at the end, not a four class probability with found_in_0).
I am able to generate other features from the DNA seqs, this is just an example.
I can see from my data, that my data set is unbalanced, the amount of data in found_in_3 is significantly lower than the others (my full training data is about 80,000 rows; but only about 10,000 of these rows are found_in_3; the others are all found_in_0, found_in_1 or found_in_2).

What I'm trying to work out is the algorithm, for one specific point in particular. My idea was:

1.Read in the data.

df = pd.read_csv('data')

2.Split the data set into train and test

import sklearn
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,y,test_size=0.2,random_state=42)

3.Understand the data set (i.e. that's where I saw the under-representation in point 4, above). I have a series of functions for this...so let's say I have a standardised data set which is the table above.

4.Build neural network.

import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras import Model

I know the general idea here would be the tensorflow equivalent of doing this in keras (i.e. this is for the 'iris' data set; where I initialise a model, add some layers and an activation function, compile the model,generate an output of the model and then fit the model and then predict after this (not shown)):

from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(8,input_dim=4,activation='relu'))
model.add(Dense(8,input_dim=4,activation='relu'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
model.fit(x_train,y_train, epochs=150,verbose=0)

So I understand I want to replicate a similar set of steps for my data, and I'm trying to work out how to do this, what I can't understand is do I have to use tf.nn.sigmoid_cross_entropy_with_logits for this problem (since each input can belong to move than one label, i.e. can be present in found_in_1, found_in_2 and found_in_3, this can produce a probability output per class?)

Or can I just use a softmax function like this?

score 0 · Answer 1 · answered Apr 26 '20 at 15:47

There's a fundamental difference between softmax and sigmoid_cross_entropy_with_logits. The first one applies softmax function. In essence, you provide on the input unnormalised scores (logits) and outputs normalised values that can be interpreted as probabilities.

On the other hand, sigmoid_cross_entropy_with_logits will first get you a sigmoid and then compute cross entropy (against labales, its first argument!) in a numerically stable way.

More detailed explanation can be found e.g. here. I guess what you are after is softmax.

score 0 · Answer 2 · answered Apr 26 '20 at 16:01

Rule of thumb: assuming you have multiple classes/labels, does each sample belong to exactly one class (has exactly one label)?

Yes: multiclass. Use softmax for activation and [sparse] categorical cross entropy for loss.

No, object can belong to multiple classes simultaneously: multilabel. Use sigmoid for activation and binary cross entropy for loss.

From the description, I'm not certain what is the case in your scenario.

score 0 · Answer 3 · answered Apr 26 '20 at 16:08

0

If each input can belong to more than one label/class, you need to use tf.nn.sigmoid_cross_entropy_with_logits. If you use sigmoid on model output you will get 3 probabilities for 3 labels. You can read more details here.

answered Apr 26 '20 at 16:08

xashru

3,400
2
17
30

Should I use softmax or tf.nn.sigmoid_cross_entropy_with_logits to generate a multi class classification with a probability per class?

3 Answers3