Multilabel Text Classification using TensorFlow

Question

The text data is organized as vector with 20,000 elements, like [2, 1, 0, 0, 5, ...., 0]. i-th element indicates the frequency of the i-th word in a text.

The ground truth label data is also represented as vector with 4,000 elements, like [0, 0, 1, 0, 1, ...., 0]. i-th element indicates whether the i-th label is a positive label for a text. The number of labels for a text differs depending on texts.

I have a code for single-label text classification.

How can I edit the following code for multilabel text classification?

Especially, I would like to know following points.

How to compute accuracy using TensorFlow.
How to set a threshold which judges whether a label is positive or negative. For instance, if the output is [0.80, 0.43, 0.21, 0.01, 0.32] and the ground truth is [1, 1, 0, 0, 1], the labels with scores over 0.25 should be judged as positive.

Thank you.

import tensorflow as tf

# hidden Layer
class HiddenLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input

        w_h = tf.Variable(tf.random_normal([n_in, n_out],mean = 0.0,stddev = 0.05))
        b_h = tf.Variable(tf.zeros([n_out]))

        self.w = w_h
        self.b = b_h
        self.params = [self.w, self.b]

    def output(self):
        linarg = tf.matmul(self.input, self.w) + self.b
        self.output = tf.nn.relu(linarg)

        return self.output

# output Layer
class OutputLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input

        w_o = tf.Variable(tf.random_normal([n_in, n_out], mean = 0.0, stddev = 0.05))
        b_o = tf.Variable(tf.zeros([n_out]))

        self.w = w_o
        self.b = b_o
        self.params = [self.w, self.b]

    def output(self):
        linarg = tf.matmul(self.input, self.w) + self.b
        self.output = tf.nn.relu(linarg)

        return self.output

# model
def model():
    h_layer = HiddenLayer(input = x, n_in = 20000, n_out = 1000)
    o_layer = OutputLayer(input = h_layer.output(), n_in = 1000, n_out = 4000)

    # loss function
    out = o_layer.output()
    cross_entropy = -tf.reduce_sum(y_*tf.log(out + 1e-9), name='xentropy')    

    # regularization
    l2 = (tf.nn.l2_loss(h_layer.w) + tf.nn.l2_loss(o_layer.w))
    lambda_2 = 0.01

    # compute loss
    loss = cross_entropy + lambda_2 * l2

    # compute accuracy for single label classification task
    correct_pred = tf.equal(tf.argmax(out, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, "float"))

    return loss, accuracy

I think there might be a better loss function to use besides cross-entropy. — Aaron, Feb 16 '16 at 23:43
There are many different measures of accuracy for a multilabel classification problem: one-error accuracy, rank loss, mean average precision, etc. I'm still learning TensorFlow myself and haven't managed to correctly implement any of them yet. But perhaps this paper will help you: http://arxiv.org/pdf/1312.5419v3.pdf Let me know if you make any progress! — Eric Galluzzo, Feb 24 '16 at 13:51
For a better idea of accuracy consider calculating precision and recall. — Abhishek Patel, Apr 15 '17 at 02:11

Alok Nayak · Accepted Answer · 2017-02-03T08:58:31.103

Change relu to sigmoid of output layer. Modify cross entropy loss to explicit mathematical formula of sigmoid cross entropy loss (explicit loss was working in my case/version of tensorflow )

import tensorflow as tf

# hidden Layer
class HiddenLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input

        w_h = tf.Variable(tf.random_normal([n_in, n_out],mean = 0.0,stddev = 0.05))
        b_h = tf.Variable(tf.zeros([n_out]))

        self.w = w_h
        self.b = b_h
        self.params = [self.w, self.b]

    def output(self):
        linarg = tf.matmul(self.input, self.w) + self.b
        self.output = tf.nn.relu(linarg)

        return self.output

# output Layer
class OutputLayer(object):
    def __init__(self, input, n_in, n_out):
        self.input = input

        w_o = tf.Variable(tf.random_normal([n_in, n_out], mean = 0.0, stddev = 0.05))
        b_o = tf.Variable(tf.zeros([n_out]))

        self.w = w_o
        self.b = b_o
        self.params = [self.w, self.b]

    def output(self):
        linarg = tf.matmul(self.input, self.w) + self.b
        #changed relu to sigmoid
        self.output = tf.nn.sigmoid(linarg)

        return self.output

# model
def model():
    h_layer = HiddenLayer(input = x, n_in = 20000, n_out = 1000)
    o_layer = OutputLayer(input = h_layer.output(), n_in = 1000, n_out = 4000)

    # loss function
    out = o_layer.output()
    # modified cross entropy to explicit mathematical formula of sigmoid cross entropy loss
    cross_entropy = -tf.reduce_sum( (  (y_*tf.log(out + 1e-9)) + ((1-y_) * tf.log(1 - out + 1e-9)) )  , name='xentropy' )    

    # regularization
    l2 = (tf.nn.l2_loss(h_layer.w) + tf.nn.l2_loss(o_layer.w))
    lambda_2 = 0.01

    # compute loss
    loss = cross_entropy + lambda_2 * l2

    # compute accuracy for single label classification task
    correct_pred = tf.equal(tf.argmax(out, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, "float"))

    return loss, accuracy

jorgemf · Answer 2 · 2016-05-05T13:28:48.217

14

You have to use variations of cross entropy function in other to support multilabel classification. In case you have less than one thousand of ouputs you should use sigmoid_cross_entropy_with_logits, in your case that you have 4000 outputs you may consider candidate sampling as it is faster than the previous.

How to compute accuracy using TensorFlow.

This depends on your problem and what you want to achieve. If you don't want to miss any object in an image then if the classifier get all right but one, then you should consider the whole image an error. You can also consider that an object missed or missclassiffied is an error. The latter I think it supported by sigmoid_cross_entropy_with_logits.

How to set a threshold which judges whether a label is positive or negative. For instance, if the output is [0.80, 0.43, 0.21, 0.01, 0.32] and the ground truth is [1, 1, 0, 0, 1], the labels with scores over 0.25 should be judged as positive.

Threshold is one way to go, you have to decided which one. But that is some kind of hack, not real multilable classification. For that you need the previous functions I said before.

edited May 05 '16 at 13:28

answered May 05 '16 at 13:22

jorgemf

1,123
8
13

1

I don't know why people suggest 'sigmoid_cross_entropy_with_logits'. If it is what its name suggests i.e -Y*ln(sigmoid(logits)) . Then it will minimize the loss by giving high probability to every class and infact it was giving that in my case. – Alok Nayak Sep 13 '16 at 12:59
this function doesn't return a probability. And I don't see how it will minimize the loss by giving a high value. If you set to 1 to your classes and 0 when the class is not present then the network gives values close to 0 when the object is not in the image and values close to 1 or bigger (even 2 o 3) if the object is in the image. I am using it and works pretty well. – jorgemf Sep 13 '16 at 17:52
It will minimize the loss by giving a high value to every class because there is no penalty(or 0 loss) for giving high value to classes which are labelled 0. So one needs to modify cross entropy loss with binary cross entropy (y * ln(sigmoid(logits)) + 1-y * ln(sigmoid(1-logits))) . sigmoid_cross_entropy_with_logits doesn't implement binary cross entropy internally. I am surprised why is it working in your case, are you using theano etc – Alok Nayak Sep 14 '16 at 04:20
I think you are wrong with the maths. It is: y * ln(sigmoid(logits)) + (1-y) * ln(1-sigmoid(logits)) So: logits=0, y=0 => 0 ; logits=1, y=1 => 0 ; logits=1, y=0 => 1.3 ; logits=0, y=1 => 1.3 ; You can plot the function in google an play with the numbers. Just search for y*-ln (1 / ( 1 + e^-x)) +(1-y)*-ln (1-1 / ( 1 + e^-x)) – jorgemf Sep 14 '16 at 12:31
My bad, Ignore my above math. Here what I was using, which worked for me, -tf.reduce_mean(tf.mul(y,tf.log(tf.nn.sigmoid(logits) + 1e-9)) + tf.mul(1-y,tf.log(1 - tf.nn.sigmoid(logits) + 1e-9))) . This worked and what you you suggested didn't work, let me know if I am wrong with my argument – Alok Nayak Sep 14 '16 at 13:18
It might be the version of tensorflow that you are using. The equations are almost the same (you added a small number to avoid 0s and in tensorflow they use a max function). You argument is wrong, if you replace the values in the equation you get errors when logits and y doesn't match and 0 when they are the same. So I don't know why it is not working for you, but the equations are ok. – jorgemf Sep 14 '16 at 16:43
No doubt if I replace the values in my equation I get errors when logits and y doesn't match and 0 when they are the same.No doubt about my loss defination. But in tensorflow's 'sigmoid_cross_entropy_with_logits'. loss = -Y*ln(sigmoid(logits)) . Please justify this loss not the loss which I used – Alok Nayak Sep 15 '16 at 05:45
I was talking about TF, I wrote the equation and test it. Do it yourself, it works. I didn't check your equations. Tell me with which values the equatiosn of TF doesn't work – jorgemf Sep 15 '16 at 12:43
What you want to say is that it doesn't work for you. It is working for me fine since a couple of months. sigmoid_cross_entropy_with_logits doesn't use the equation you said, it uses the one I wrote before (it is in the docts of tensorflow): y * ln(sigmoid(logits)) + (1-y) * ln(1-sigmoid(logits)) – jorgemf Sep 15 '16 at 16:39

Multilabel Text Classification using TensorFlow

2 Answers2

Linked