10

Here is the code that I am using. I'm trying to get a 1, 0, or hopefully a probability in result to a real test set. When I just split up the training set and run it on the training set I get a ~93% accuracy rate, but when I train the program and run it on the actual test set (the one without the 1's and 0's filling in column 1) it returns nothing but nan's.

import tensorflow as tf
import numpy as np
from numpy import genfromtxt
import sklearn

# Convert to one hot
def convertOneHot(data):
    y=np.array([int(i[0]) for i in data])
    y_onehot=[0]*len(y)
    for i,j in enumerate(y):
        y_onehot[i]=[0]*(y.max() + 1)
        y_onehot[i][j]=1
    return (y,y_onehot)


data = genfromtxt('cs-training.csv',delimiter=',')  # Training data
test_data = genfromtxt('cs-test-actual.csv',delimiter=',')  # Actual test data

#This part is to get rid of the nan's at the start of the actual test data
g = 0
for i in test_data:
    i[0] = 1
    test_data[g] = i
    g += 1

x_train=np.array([ i[1::] for i in data])
y_train,y_train_onehot = convertOneHot(data)

x_test=np.array([ i[1::] for i in test_data])
y_test,y_test_onehot = convertOneHot(test_data)
A=data.shape[1]-1 # Number of features, Note first is y
B=len(y_train_onehot[0])
tf_in = tf.placeholder("float", [None, A]) # Features
tf_weight = tf.Variable(tf.zeros([A,B]))
tf_bias = tf.Variable(tf.zeros([B]))
tf_softmax = tf.nn.softmax(tf.matmul(tf_in,tf_weight) + tf_bias)

# Training via backpropagation
tf_softmax_correct = tf.placeholder("float", [None,B])
tf_cross_entropy = -tf.reduce_sum(tf_softmax_correct*tf.log(tf_softmax))

# Train using tf.train.GradientDescentOptimizer
tf_train_step = tf.train.GradientDescentOptimizer(0.01).minimize(tf_cross_entropy)

# Add accuracy checking nodes
tf_correct_prediction = tf.equal(tf.argmax(tf_softmax,1), tf.argmax(tf_softmax_correct,1))
tf_accuracy = tf.reduce_mean(tf.cast(tf_correct_prediction, "float"))

saver = tf.train.Saver([tf_weight,tf_bias])

# Initialize and run
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

print("...")
# Run the training
for i in range(1):
    sess.run(tf_train_step, feed_dict={tf_in: x_train, tf_softmax_correct: y_train_onehot})
    #print y_train_onehot
    saver.save(sess, 'trained_csv_model')

    ans = sess.run(tf_softmax, feed_dict={tf_in: x_test})
    print ans

#Print accuracy
    #result = sess.run(tf_accuracy, feed_dict={tf_in: x_test, tf_softmax_correct: y_test_onehot})
#print result

When I print ans I get the following.

[[ nan  nan]
 [ nan  nan]
 [ nan  nan]
 ..., 
 [ nan  nan]
 [ nan  nan]
 [ nan  nan]]

I don't know what I'm doing wrong here. All I want is for ans to yield a 1, 0, or especially an array of probabilities where every unit inside the array has a length of 2.

I don't expect that many people are going to be able to answer this question for me, but please try at the very least. I'm stuck here waiting for a stroke of genius moment which hasn't come in 2 days now so I figured that I would ask. Thank you!

The test_data comes out looking like this-

[[  1.00000000e+00   8.85519080e-01   4.30000000e+01 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]
 [  1.00000000e+00   4.63295269e-01   5.70000000e+01 ...,   4.00000000e+00
0.00000000e+00   2.00000000e+00]
 [  1.00000000e+00   4.32750360e-02   5.90000000e+01 ...,   1.00000000e+00
0.00000000e+00   2.00000000e+00]
 ..., 
 [  1.00000000e+00   8.15963730e-02   7.00000000e+01 ...,   0.00000000e+00
0.00000000e+00              nan]
 [  1.00000000e+00   3.35456547e-01   5.60000000e+01 ...,   2.00000000e+00
1.00000000e+00   3.00000000e+00]
 [  1.00000000e+00   4.41841663e-01   2.90000000e+01 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]]

And the only reason that the first unit in the data is equal to 1 is because I got rid of the nan's that filled that position in order to avoid errors. Note that everything after the first column is a feature. The first column is what I'm trying to be able to predict.

EDIT:

I changed the code to the following-

import tensorflow as tf
import numpy as np
from numpy import genfromtxt
import sklearn
from sklearn.cross_validation import train_test_split
from tensorflow import Print

# Convert to one hot
def convertOneHot(data):
    y=np.array([int(i[0]) for i in data])
    y_onehot=[0]*len(y)
    for i,j in enumerate(y):
        y_onehot[i]=[0]*(y.max() + 1)
        y_onehot[i][j]=1
    return (y,y_onehot)


#buildDataFromIris()


data = genfromtxt('cs-training.csv',delimiter=',')  # Training data
test_data = genfromtxt('cs-test-actual.csv',delimiter=',')  # Test data

#for i in test_data[0]:
#    print i
#print test_data

#print test_data
g = 0
for i in test_data:
    i[0] = 1.
    test_data[g] = i
    g += 1

#print 1, test_data

x_train=np.array([ i[1::] for i in data])
y_train,y_train_onehot = convertOneHot(data)
#print len(x_train), len(y_train), len(y_train_onehot)

x_test=np.array([ i[1::] for i in test_data])
y_test,y_test_onehot = convertOneHot(test_data)
#for u in y_test_onehot[0]:
#    print u
#print y_test_onehot
#print len(x_test), len(y_test), len(y_test_onehot)
#print x_test[0]

#print '1'

#  A number of features, 4 in this example
#  B = 3 species of Iris (setosa, virginica and versicolor)
A=data.shape[1]-1 # Number of features, Note first is y
#print A
B=len(y_train_onehot[0])
#print B
#print y_train_onehot
tf_in = tf.placeholder("float", [None, A]) # Features
tf_weight = tf.Variable(tf.zeros([A,B]))
tf_bias = tf.Variable(tf.zeros([B]))
tf_softmax = tf.nn.softmax(tf.matmul(tf_in,tf_weight) + tf_bias)

tf_bias = tf.Print(tf_bias, [tf_bias], "Bias: ")
tf_weight = tf.Print(tf_weight, [tf_weight], "Weight: ")
tf_in = tf.Print(tf_in, [tf_in], "TF_in: ")
matmul_result = tf.matmul(tf_in, tf_weight)
matmul_result = tf.Print(matmul_result, [matmul_result], "Matmul: ")
tf_softmax = tf.nn.softmax(matmul_result + tf_bias)
print tf_bias
print tf_weight
print tf_in
print matmul_result

# Training via backpropagation
tf_softmax_correct = tf.placeholder("float", [None,B])
tf_cross_entropy = -tf.reduce_sum(tf_softmax_correct*tf.log(tf_softmax))

print tf_softmax_correct

# Train using tf.train.GradientDescentOptimizer
tf_train_step = tf.train.GradientDescentOptimizer(0.01).minimize(tf_cross_entropy)

# Add accuracy checking nodes
tf_correct_prediction = tf.equal(tf.argmax(tf_softmax,1), tf.argmax(tf_softmax_correct,1))
tf_accuracy = tf.reduce_mean(tf.cast(tf_correct_prediction, "float"))

print tf_correct_prediction
print tf_accuracy

#saver = tf.train.Saver([tf_weight,tf_bias])

# Initialize and run
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

print("...")
prediction = []
# Run the training
#probabilities = []
#print y_train_onehot
#print '-----------------------------------------'
for i in range(1):
    sess.run(tf_train_step, feed_dict={tf_in: x_train, tf_softmax_correct: y_train_onehot})
    #print y_train_onehot
    #saver.save(sess, 'trained_csv_model')

    ans = sess.run(tf_softmax, feed_dict={tf_in: x_test})
    print ans

After the print out I see that one of the objects is Boolean. I don't know if that is the issue but take a look at the following and see if there is any way that you can help.

Tensor("Print_16:0", shape=TensorShape([Dimension(2)]), dtype=float32)
Tensor("Print_17:0", shape=TensorShape([Dimension(10), Dimension(2)]), dtype=float32)
Tensor("Print_18:0", shape=TensorShape([Dimension(None), Dimension(10)]), dtype=float32)
Tensor("Print_19:0", shape=TensorShape([Dimension(None), Dimension(2)]), dtype=float32)
Tensor("Placeholder_9:0", shape=TensorShape([Dimension(None), Dimension(2)]), dtype=float32)
Tensor("Equal_4:0", shape=TensorShape([Dimension(None)]), dtype=bool)
Tensor("Mean_4:0", shape=TensorShape([]), dtype=float32)
...
[[ nan  nan]
 [ nan  nan]
 [ nan  nan]
 ..., 
 [ nan  nan]
 [ nan  nan]
 [ nan  nan]]
Ravaal
  • 3,233
  • 6
  • 39
  • 66

2 Answers2

13

I don't know the direct answer, but I know how I'd approach debugging it: tf.Print. It's an op that prints the value as tensorflow is executing, and returns the tensor for further computation, so you can just sprinkle them inline in your model.

Try throwing in a few of these. Instead of this line:

tf_softmax = tf.nn.softmax(tf.matmul(tf_in,tf_weight) + tf_bias)

Try:

tf_bias = tf.Print(tf_bias, [tf_bias], "Bias: ")
tf_weight = tf.Print(tf_weight, [tf_weight], "Weight: ")
tf_in = tf.Print(tf_in, [tf_in], "TF_in: ")
matmul_result = tf.matmul(tf_in, tf_weight)
matmul_result = tf.Print(matmul_result, [matmul_result], "Matmul: ")
tf_softmax = tf.nn.softmax(matmul_result + tf_bias)

to see what Tensorflow thinks the intermediate values are. If the NaNs are showing up earlier in the pipeline, it should give you a better idea of where the problem lies. Good luck! If you get some data out of this, feel free to follow up and we'll see if we can get you further.

Updated to add: Here's a stripped-down debugging version to try, where I got rid of the input functions and just generated some random data:

import tensorflow as tf
import numpy as np

def dense_to_one_hot(labels_dense, num_classes=10):
  """Convert class labels from scalars to one-hot vectors."""
  num_labels = labels_dense.shape[0]
  index_offset = np.arange(num_labels) * num_classes
  labels_one_hot = np.zeros((num_labels, num_classes))
  labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1
  return labels_one_hot

x_train=np.random.normal(0, 1, [50,10])
y_train=np.random.randint(0, 10, [50])
y_train_onehot = dense_to_one_hot(y_train, 10)

x_test=np.random.normal(0, 1, [50,10])
y_test=np.random.randint(0, 10, [50])
y_test_onehot = dense_to_one_hot(y_test, 10)

#  A number of features, 4 in this example
#  B = 3 species of Iris (setosa, virginica and versicolor)

A=10
B=10
tf_in = tf.placeholder("float", [None, A]) # Features
tf_weight = tf.Variable(tf.zeros([A,B]))
tf_bias = tf.Variable(tf.zeros([B]))
tf_softmax = tf.nn.softmax(tf.matmul(tf_in,tf_weight) + tf_bias)

tf_bias = tf.Print(tf_bias, [tf_bias], "Bias: ")
tf_weight = tf.Print(tf_weight, [tf_weight], "Weight: ")
tf_in = tf.Print(tf_in, [tf_in], "TF_in: ")
matmul_result = tf.matmul(tf_in, tf_weight)
matmul_result = tf.Print(matmul_result, [matmul_result], "Matmul: ")
tf_softmax = tf.nn.softmax(matmul_result + tf_bias)

# Training via backpropagation
tf_softmax_correct = tf.placeholder("float", [None,B])
tf_cross_entropy = -tf.reduce_sum(tf_softmax_correct*tf.log(tf_softmax))

# Train using tf.train.GradientDescentOptimizer
tf_train_step = tf.train.GradientDescentOptimizer(0.01).minimize(tf_cross_entropy)

# Add accuracy checking nodes
tf_correct_prediction = tf.equal(tf.argmax(tf_softmax,1), tf.argmax(tf_softmax_correct,1))
tf_accuracy = tf.reduce_mean(tf.cast(tf_correct_prediction, "float"))

print tf_correct_prediction
print tf_accuracy

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

for i in range(1):
    print "Running the training step"
    sess.run(tf_train_step, feed_dict={tf_in: x_train, tf_softmax_correct: y_train_onehot})
    #print y_train_onehot
    #saver.save(sess, 'trained_csv_model')

    print "Running the eval step"
    ans = sess.run(tf_softmax, feed_dict={tf_in: x_test})
    print ans

You should see lines starting with "Bias: ", etc.

dga
  • 21,757
  • 3
  • 44
  • 51
  • First and foremost I have to thank you for trying to help me out with this issue. I am trying to debug the code and find out where the numbers change to NaN's but I still can't find it. I tried your tf.Print and it doesn't yield any output; just the same array of NaN's. – Ravaal Nov 30 '15 at 14:50
  • please check the most recent edit I made to the code in the OP. There is a boolean in there and I don't know if that is where the code is going wrong. If it is, maybe you can suggest a way to fix it. – Ravaal Nov 30 '15 at 16:06
  • You didn't put the prints in the order I suggested. You need to interleave them: Right after you define tf_bias, reassign `tf_bias = tf.Print(tf_bias, [tf_bias], "Bias: ")`. The print operator is only executed as data flows through it. Specifically, you need to do those reassigns BEFORE the matmul, because that's where the data is going to... – dga Nov 30 '15 at 19:18
  • The output you show in the edit 2 is from 'print ', not from the tf.Print op. Can you show the output from tf.Print? Printing the tf tensors in python will just get you output like `Tensor("Print_16:0", shape=TensorShape([Dimension(2)]), dtype=float32)`, but that doesn't tell you what's in the tensor. `tf.Print` will show you the _runtime_ values as it executes, and that's where you should be able to spot the NaN's. – dga Dec 01 '15 at 06:16
  • `tf.Print` doesn't yield any output whatsoever. I tried adding `sess.run(tf_in)` and I get an error even after I define `sess = tf.Session()`. I also tried `print sess.run(tf_in)` which provides an error as well. I tried just plain `tf.Print()` and still no output. – Ravaal Dec 01 '15 at 18:33
  • Can you run the version I just added and show the output? Is it crashing before it actually runs? – dga Dec 01 '15 at 19:00
  • Thank you so much for your help btw. I just reviewed your profile and I can't tell you how grateful I am. Unfortunately, here is the output `Tensor("Equal_9:0", shape=TensorShape([Dimension(None)]), dtype=bool) Tensor("Mean_9:0", shape=TensorShape([]), dtype=float32) Running the training step Running the eval step [[ nan nan] [ nan nan] [ nan nan] ..., [ nan nan] [ nan nan] [ nan nan]]` – Ravaal Dec 01 '15 at 19:04
  • I'm seriously confused. I stripped out your data input and replaced it with some random and I see the expected logging output. I'll update the above 'test version' with my version where I stub out your input functions so I can run the code. If you don't see the Prints there, there's something wrong with your tf install (try updating from -head and reinstalling?). If you do, then it's something breaking earlier in the graph due to your data input scheme. – dga Dec 01 '15 at 19:17
  • Tried uninstalling and reinstalling to no avail. I emailed you at your cmu address about this with the output. – Ravaal Dec 01 '15 at 19:33
  • I don't know if I'm using the wrong version or what the problem is – Ravaal Dec 01 '15 at 20:41
  • "(try updating from -head and reinstalling?)" can you give me a line of code to show what you mean by updating from -head? I assume it's an Ubuntu command right? – Ravaal Dec 01 '15 at 21:14
7

tf_cross_entropy = -tf.reduce_sum(tf_softmax_correct*tf.log(tf_softmax))

This was my problem on a project I was testing on. Specificaly it ended up being 0*log(0) which produces nan.

If you replace this with:

tf_cross_entropy = -tf.reduce_sum(tf_softmax_correct*tf.log(tf_softmax + 1e-50)) It should avoid the problem.

Ive also used reduce_mean rather than reduce_sum. If you double the batch size and use reduce_sum it will double the cost (and the magnitude of the gradient). In addition to that when using tf.print (which prints to the console tensorfow was started from) it makes it a bit more comparable when varying batch size.

Specifically this is what I'm using now when debugging:

cross_entropy = -tf.reduce_sum(y*tf.log(model + 1e-50)) ## avoid nan due to 0*log(0) cross_entropy = tf.Print(cross_entropy, [cross_entropy], "cost") #print to the console tensorflow was started from

neuron
  • 1,224
  • 12
  • 14
  • Ahhh, now I see why tf.Print never worked for me. I was using IPython Notebook the entire time. Thank you for this answer. I will be upvoting it but the problem has already been solved. If you think that this will solve the problem, as I do, then I will click the check. – Ravaal Jan 14 '16 at 16:22
  • The `tf.softmax_cross_entropy_with_logits` loss function appears to be more robust for these corner cases: http://stackoverflow.com/questions/34240703/difference-between-tensorflow-tf-nn-softmax-and-tf-nn-softmax-cross-entropy-with – Lenar Hoyt Jun 11 '16 at 01:12