TensorFlow - why doesn't this sofmax regression learn anything?

Question

I am aiming to do big things with TensorFlow, but I'm trying to start small.

I have small greyscale squares (with a little noise) and I want to classify them according to their colour (e.g. 3 categories: black, grey, white). I wrote a little Python class to generate squares, and 1-hot vectors, and modified their basic MNIST example to feed them in.

But it won't learn anything - e.g. for 3 categories it always guesses ≈33% correct.

import tensorflow as tf
import generate_data.generate_greyscale

data_generator = generate_data.generate_greyscale.GenerateGreyScale(28, 28, 3, 0.05)
ds = data_generator.generate_data(10000)
ds_validation = data_generator.generate_data(500)
xs = ds[0]
ys = ds[1]
num_categories = data_generator.num_categories

x = tf.placeholder("float", [None, 28*28])
W = tf.Variable(tf.zeros([28*28, num_categories]))
b = tf.Variable(tf.zeros([num_categories]))
y = tf.nn.softmax(tf.matmul(x,W) + b)
y_ = tf.placeholder("float", [None,num_categories])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

# let batch_size = 100 --> therefore there are 100 batches of training data
xs = xs.reshape(100, 100, 28*28) # reshape into 100 minibatches of size 100
ys = ys.reshape((100, 100, num_categories)) # reshape into 100 minibatches of size 100

for i in range(100):
  batch_xs = xs[i]
  batch_ys = ys[i]
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

xs_validation = ds_validation[0]
ys_validation = ds_validation[1]
print sess.run(accuracy, feed_dict={x: xs_validation, y_: ys_validation})

My data generator looks like this:

import numpy as np
import random

class GenerateGreyScale():
    def __init__(self, num_rows, num_cols, num_categories, noise):
        self.num_rows = num_rows
        self.num_cols = num_cols
        self.num_categories = num_categories
        # set a level of noisiness for the data
        self.noise = noise

    def generate_label(self):
        lab = np.zeros(self.num_categories)
        lab[random.randint(0, self.num_categories-1)] = 1
        return lab

    def generate_datum(self, lab):
        i = np.where(lab==1)[0][0]
        frac = float(1)/(self.num_categories-1) * i
        arr = np.random.uniform(max(0, frac-self.noise), min(1, frac+self.noise), self.num_rows*self.num_cols)
        return arr

    def generate_data(self, num):
        data_arr = np.zeros((num, self.num_rows*self.num_cols))
        label_arr = np.zeros((num, self.num_categories))
        for i in range(0, num):
            label = self.generate_label()
            datum = self.generate_datum(label)
            data_arr[i] = datum
            label_arr[i] = label
        #data_arr = data_arr.astype(np.float32)
        #label_arr = label_arr.astype(np.float32)
        return data_arr, label_arr

score 3 · Answer 1 · answered Nov 24 '15 at 00:57

3

For starters, try initializing your W matrix with random values, not zeros - you're not giving the optimizer anything to work with when the output is all zeros for all inputs.

Instead of:

W = tf.Variable(tf.zeros([28*28, num_categories]))

Try:

W = tf.Variable(tf.truncated_normal([28*28, num_categories],
                                    stddev=0.1))

answered Nov 24 '15 at 00:57

dga

21,757
3
44
51

I am now using `tf.truncated_normal()` and `tf.constant()` for my weights and biases as suggested by you and their [tutorial](http://tensorflow.org/tutorials/mnist/pros/index.html#weight-initialization). But still no change: guesses y's randomly. :( – 9th Dimension Nov 24 '15 at 02:13
Are you sure there's not a bug with your data? Is ds[0] 100 items? ds = data_generator.generate_data(10000) xs = ds[0] xs = xs.reshape(100, 100, 28*28) I'd be more comfortable if xs had the right number before you reshaped... – dga Nov 24 '15 at 05:34
Hi. The nparray xs begins as shape (10000, 784), then is resized to (100, 100, 784). I changed my post to include the class that creates data so you can try it out if you want. Thanks! – 9th Dimension Nov 24 '15 at 09:55
I have also been playing around with the sizes of the squares - for very small greyscale squares (below about 7x7 pixels) the softmax regression gets 0% error. Then increase width or height by 1 pixel and it jumps back down to 66% error. – 9th Dimension Nov 24 '15 at 10:11

score 2 · Answer 2 · edited May 23 '17 at 12:14

You issue is that the your gradients are increasing/decreasing without bounds, causing the loss function to become nan.

Take a look at this question: Why does TensorFlow example fail when increasing batch size?

Furthermore, make sure that you run the model for a sufficient number of steps. You are only running it once through your train dataset (100 times * 100 examples), and this is not enough for it to converge. Increase it to something like 2000 at a minimum (running 20 times through your dataset).

Edit (can't comment, so i'll add my thoughts here): The point of the post i linked is that you can use GradientDescentOptimizer, as long as you make the learning rate something like 0.001. That's the issue, your learning rate was too high for the loss function you were using.

Alternatively, use a different loss function, that doesn't increase/decrease the gradients as much. Use tf.reduce_mean instead of tf.reduce_sum in the definition of crossEntropy.

9th Dimension · Accepted Answer · 2016-06-12T21:33:57.427

While dga and syncd's responses were helpful, I tried using non-zero weight initialization and larger datasets but to no avail. The thing that finally worked was using a different optimization algorithm.

I replaced:

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

with

train_step = tf.train.AdamOptimizer(0.0005).minimize(cross_entropy)

I also embedded the training for loop in another for loop to train for several epochs, resulting in convergence like this:

 ===# EPOCH 0 #===
Error: 0.370000004768
 ===# EPOCH 1 #===
Error: 0.333999991417
 ===# EPOCH 2 #===
Error: 0.282000005245
 ===# EPOCH 3 #===
Error: 0.222000002861
 ===# EPOCH 4 #===
Error: 0.152000010014
 ===# EPOCH 5 #===
Error: 0.111999988556
 ===# EPOCH 6 #===
Error: 0.0680000185966
 ===# EPOCH 7 #===
Error: 0.0239999890327
 ===# EPOCH 8 #===
Error: 0.00999999046326
 ===# EPOCH 9 #===
Error: 0.00400000810623

EDIT - WHY IT WORKS: I suppose the problem was that I didn't manually choose a good learning rate schedule, and Adam was able to generate a better one automatically.

score 0 · Answer 4 · answered Aug 28 '16 at 01:05

Found this question when I was having a similar issue..I fixed mine by scaling the features.

A little background: I was following the tensorflow tutorial, however I wanted to use the data from Kaggle(see data here) to do the modeling, but in the beginning I kepted having the same issue: the model just doesn't learn..after rounds of trouble-shooting, I realized that the Kaggle data was on a completely different scale. Therefore, I scaled the data so that it shares the same scale(0,1) as the tensorflow's MNIST dataset.

Just figured that I would add my two cents here..in case some beginners who are trying to follow the tutorial's settings get stuck like I did =)

Yes - for transfer learning it's essential to check that your new data are preprocessed in the same way as the original training data of the model you're transferring. Please note that this problem I had was to do with the learning rate, and I was training a model from scratch with artificial data generated in the interval [0, 1]. — 9th Dimension, Aug 30 '16 at 12:57

TensorFlow - why doesn't this sofmax regression learn anything?

4 Answers4

Linked