Tensorflow accuracy at .99 but predictions awful

Question

Maybe I'm making predictions wrong?

Here's the project... I have a greyscale input image that I am trying to segment. The segmentation is a simple binary classification (think of foreground vs background). So the ground truth (y) is a matrix of 0's and 1's -- so there's 2 classifications. Oh and the input image is a square, so I just use one variable called n_input

My accuracy essentially converges to 0.99 but when I make a prediction I get all zero's. EDIT --> there is a single 1 in each output matrices, both in the same place...

Here's my session code(everything else is working)...

with tf.Session() as sess:
    sess.run(init)
    summary = tf.train.SummaryWriter('/tmp/logdir/', sess.graph_def)
    step = 1
    from tensorflow.contrib.learn.python.learn.datasets.scroll import scroll_data
    data = scroll_data.read_data('/home/kendall/Desktop/')
    # Keep training until reach max iterations
    flag = 0
    # while flag == 0:
    while step * batch_size < training_iters:
        batch_y, batch_x = data.train.next_batch(batch_size)
        # pdb.set_trace()
        # batch_x = batch_x.reshape((batch_size, n_input))
        batch_x = batch_x.reshape((batch_size, n_input, n_input))
        batch_y = batch_y.reshape((batch_size, n_input, n_input))
        batch_y = convert_to_2_channel(batch_y, batch_size)
        # batch_y = batch_y.reshape((batch_size, n_output, n_classes))
        batch_y = batch_y.reshape((batch_size, 200, 200, n_classes))
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y,
                                       keep_prob: dropout})
        if step % display_step == 0:
            flag = 1
            # Calculate batch loss and accuracy
            loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
                                                              y: batch_y,
                                                              keep_prob: 1.})
            print "Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc)
        step += 1
    print "Optimization Finished!"
    save_path = "model.ckpt"
    saver.save(sess, save_path)

    im = Image.open('/home/kendall/Desktop/HA900_frames/frame0635.tif')
    batch_x = np.array(im)
    pdb.set_trace()
    batch_x = batch_x.reshape((1, n_input, n_input))
    batch_x = batch_x.astype(float)
    # pdb.set_trace()
    prediction = sess.run(pred, feed_dict={x: batch_x, keep_prob: 1.})
    print prediction
    arr1 = np.empty((n_input,n_input))
    arr2 = np.empty((n_input,n_input))
    for i in xrange(n_input):
        for j in xrange(n_input):
            for k in xrange(2):
                if k == 0:
                    arr1[i][j] = prediction[0][i][j][k]
                else:
                    arr2[i][j] = prediction[0][i][j][k]
    # prediction = np.asarray(prediction)
    # prediction = np.reshape(prediction, (200,200))
    # np.savetxt("prediction.csv", prediction, delimiter=",")
    np.savetxt("prediction1.csv", arr1, delimiter=",")
    np.savetxt("prediction2.csv", arr2, delimiter=",")

Since there are two classifications, that end part (with the couple of loops) is just to partition the prediction into two 2x2 matrices.

I saved the prediction arrays to a CSV file, and like I said, they were all zeros.

I have also confirmed all data is correct (dimensions and values).

Why would the training converge, but predictions are awful?

If you want to look at all the code, here it is...

import tensorflow as tf
import pdb
import numpy as np
from numpy import genfromtxt
from PIL import Image

# Import MINST data
# from tensorflow.examples.tutorials.mnist import input_data
# mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)


# Parameters
learning_rate = 0.001
training_iters = 20000
batch_size = 128
display_step = 1

# Network Parameters
n_input = 200 # MNIST data input (img shape: 28*28)
n_output = 40000 # MNIST total classes (0-9 digits)
n_classes = 2
#n_input = 200

dropout = 0.75 # Dropout, probability to keep units

# tf Graph input
x = tf.placeholder(tf.float32, [None, n_input, n_input])
y = tf.placeholder(tf.float32, [None, n_input, n_input, n_classes])
keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)

# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1):
    # Conv2D wrapper, with bias and relu activation
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)

def maxpool2d(x, k=2):
    # MaxPool2D wrapper
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
                          padding='SAME')


# Create model
def conv_net(x, weights, biases, dropout):
    # Reshape input picture
    x = tf.reshape(x, shape=[-1, n_input, n_input, 1])

    # Convolution Layer
    conv1 = conv2d(x, weights['wc1'], biases['bc1'])
    # Max Pooling (down-sampling)
    conv1 = maxpool2d(conv1, k=2)
    conv1 = tf.nn.local_response_normalization(conv1)

    # Convolution Layer
    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
    # Max Pooling (down-sampling)
    conv2 = tf.nn.local_response_normalization(conv2)
    conv2 = maxpool2d(conv2, k=2)

    # Convolution Layer
    conv3 = conv2d(conv2, weights['wc3'], biases['bc3'])
    # Max Pooling (down-sampling)
    conv3 = tf.nn.local_response_normalization(conv3)
    conv3 = maxpool2d(conv3, k=2)

    # pdb.set_trace()

    # Fully connected layer
    # Reshape conv2 output to fit fully connected layer input
    fc1 = tf.reshape(conv3, [-1, weights['wd1'].get_shape().as_list()[0]])
    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
    fc1 = tf.nn.relu(fc1)
    # Apply Dropout
    fc1 = tf.nn.dropout(fc1, dropout)

    output = []
    for i in xrange(2):
        output.append(tf.nn.softmax(tf.add(tf.matmul(fc1, weights['out']), biases['out'])))

    return output
    # return tf.nn.softmax(tf.add(tf.matmul(fc1, weights['out']), biases['out']))


# Store layers weight & bias
weights = {
    # 5x5 conv, 1 input, 32 outputs
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    # 5x5 conv, 32 inputs, 64 outputs
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    # 5x5 conv, 32 inputs, 64 outputs
    'wc3': tf.Variable(tf.random_normal([5, 5, 64, 128])),
    # fully connected, 7*7*64 inputs, 1024 outputs
    'wd1': tf.Variable(tf.random_normal([25*25*128, 1024])),
    # 1024 inputs, 10 outputs (class prediction)
    'out': tf.Variable(tf.random_normal([1024, n_output]))
}

biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bc3': tf.Variable(tf.random_normal([128])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([n_output]))
}

# Construct model
pred = conv_net(x, weights, biases, keep_prob)
# pdb.set_trace()
pred = tf.pack(tf.transpose(pred,[1,2,0]))
pred = tf.reshape(pred, [-1,n_input,n_input,n_classes])
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.initialize_all_variables()
saver = tf.train.Saver()

def convert_to_2_channel(x, batch_size):
    #assume input has dimension (batch_size,x,y)
    #output will have dimension (batch_size,x,y,2)
    output = np.empty((batch_size, 200, 200, 2))

    temp_arr1 = np.empty((batch_size, 200, 200))
    temp_arr2 = np.empty((batch_size, 200, 200))

    for i in xrange(batch_size):
        for j in xrange(200):
            for k in xrange(200):
                if x[i][j][k] == 1:
                    temp_arr1[i][j][k] = 1
                    temp_arr2[i][j][k] = 0
                else:
                    temp_arr1[i][j][k] = 0
                    temp_arr2[i][j][k] = 1

    for i in xrange(batch_size):
        for j in xrange(200):
            for k in xrange(200):
                for l in xrange(2):
                    if l == 0:
                        output[i][j][k][l] = temp_arr1[i][j][k]
                    else:
                        output[i][j][k][l] = temp_arr2[i][j][k]

    return output

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    summary = tf.train.SummaryWriter('/tmp/logdir/', sess.graph_def)
    step = 1
    from tensorflow.contrib.learn.python.learn.datasets.scroll import scroll_data
    data = scroll_data.read_data('/home/kendall/Desktop/')
    # Keep training until reach max iterations
    flag = 0
    # while flag == 0:
    while step * batch_size < training_iters:
        batch_y, batch_x = data.train.next_batch(batch_size)
        # pdb.set_trace()
        # batch_x = batch_x.reshape((batch_size, n_input))
        batch_x = batch_x.reshape((batch_size, n_input, n_input))
        batch_y = batch_y.reshape((batch_size, n_input, n_input))
        batch_y = convert_to_2_channel(batch_y, batch_size)
        # batch_y = batch_y.reshape((batch_size, n_output, n_classes))
        batch_y = batch_y.reshape((batch_size, 200, 200, n_classes))
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y,
                                       keep_prob: dropout})
        if step % display_step == 0:
            flag = 1
            # Calculate batch loss and accuracy
            loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
                                                              y: batch_y,
                                                              keep_prob: 1.})
            print "Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc)
        step += 1
    print "Optimization Finished!"
    save_path = "model.ckpt"
    saver.save(sess, save_path)

    im = Image.open('/home/kendall/Desktop/HA900_frames/frame0635.tif')
    batch_x = np.array(im)
    pdb.set_trace()
    batch_x = batch_x.reshape((1, n_input, n_input))
    batch_x = batch_x.astype(float)
    # pdb.set_trace()
    prediction = sess.run(pred, feed_dict={x: batch_x, keep_prob: 1.})
    print prediction
    arr1 = np.empty((n_input,n_input))
    arr2 = np.empty((n_input,n_input))
    for i in xrange(n_input):
        for j in xrange(n_input):
            for k in xrange(2):
                if k == 0:
                    arr1[i][j] = prediction[0][i][j][k]
                else:
                    arr2[i][j] = prediction[0][i][j][k]
    # prediction = np.asarray(prediction)
    # prediction = np.reshape(prediction, (200,200))
    # np.savetxt("prediction.csv", prediction, delimiter=",")
    np.savetxt("prediction1.csv", arr1, delimiter=",")
    np.savetxt("prediction2.csv", arr2, delimiter=",")

    # Calculate accuracy for 256 mnist test images
    print "Testing Accuracy:", \
        sess.run(accuracy, feed_dict={x: data.test.images[:256],
                                      y: data.test.labels[:256],
                                      keep_prob: 1.})

Are the two classes very unbalanced? (Like 100 to 1). Can you give the exact proportion. — Olivier Moindrot, Jun 18 '16 at 17:30
@OlivierMoindrot from a script I just wrote to calculate, the proportion of 1's/0's is .23 so I don't think that is it — Kendall Weihe, Jun 18 '16 at 17:45
@OlivierMoindrot could it be the way in which I classify? `pred` is dimension `(?,200,200,2)` so `batch_y` is converted into the same dimensions (from `(?,200,200)`) where each "channel" is the classification. In other words the first channel would be classifications for zeros and the second for ones. Furthermore, the two channels are basically inverses (opposites?) of each other. — Kendall Weihe, Jun 18 '16 at 17:47
@OlivierMoindrot my results are `Iter 128, Minibatch Loss= 0.713277, Training Accuracy= 0.99375` — Kendall Weihe, Jun 18 '16 at 18:04
@OlivierMoindrot oh oh oh, here's something that might help. I found a single 1 in the prediction matrix... — Kendall Weihe, Jun 18 '16 at 18:09

score 8 · Accepted Answer · answered Jun 18 '16 at 19:21

Errors in the code

There are multiple errors in your code:

you shouldn't call tf.nn.sigmoid_cross_entropy_with_logits with the output of a softmax layer, but with the unscaled logits:

WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.

in fact since you have 2 classes, you should use a loss with softmax, using tf.nn.softmax_cross_entropy_with_logits
When using tf.argmax(pred, 1), you only apply argmax over axis 1, which is the height of the output image. You should use tf.argmax(pred, 3) on the last axis (of size 2).
- This might explain why you get 0.99 accuracy
- On the output image, it will take the argmax over the height of the image, which is by default 0 (as all values are equal for each channel)

Wrong model

The biggest drawback is that your model in general will be very hard to optimize.

You have a softmax over 40,000 classes, which is huge.
You do not take advantage at all of the fact that you want to output an image (the prediction foreground / background).
- for instance prediction 2,345 is highly correlated with prediction 2,346 and prediction 2,545 but you don't take that into account

I recommend reading a bit about semantic segmentation first:

this paper: Fully Convolutional Networks for Semantic Segmentation
these slides from CS231n (Stanford): especially the part about upsampling and deconvolution

Recommendations

If you want to work with TensorFlow, you will need to start small. First try a very simple network with maybe 1 hidden layer.

You need to plot all the shapes of your tensors to make sure they correspond to what you thought. For instance, if you had plotted tf.argmax(y, 1), you would have realized the shape is [batch_size, 200, 2] instead of the expected [batch_size, 200, 200].

TensorBoard is your friend, you should try to plot the input image here, as well as your predictions to see what they look like.

Try small, with a very small dataset of 10 images and see if you can overfit it and predict almost the exact response.

To conclude, I am not sure of all my suggestions but they are worth trying, and I hope this will help you on the path to success !

How are you able to see the dimensions of tensors on a tensorboard graph? — Kendall Weihe, Jun 18 '16 at 19:31
In the connections, they are usually written. The best way is to print `tensor.get_shape()` in your code to get the inferred shape of `tensor` — Olivier Moindrot, Jun 18 '16 at 19:32
Thanks a ton for your answer btw! One last thing... from what I can tell Tensorflow doesn't have a classifier function that classifies per pixel. In other words, what I really want is a simple `(pred[x][y] - y[x][y])` as my cost function, and then feed that into the optimizer — Kendall Weihe, Jun 18 '16 at 19:34
You should reshape the predictions to `[batch_size*200*200, 2]` and use `tf.nn.softmax_cross_entropy_with_logits`. You can see the detailed answer [here](http://stackoverflow.com/questions/35317029/how-to-implement-pixel-wise-classification-for-scene-labeling-in-tensorflow/37294185?noredirect=1#comment63253577_37294185) — Olivier Moindrot, Jun 18 '16 at 19:37
I think you mean `sparse_softmax...` right? Everything that you provided has been great, and it will be incorporated. First things first though I just want a couple of conv layers with a fully connected output layer for classifications. Later I will definitely use upsampling and what not. Can you see my latest question here: http://stackoverflow.com/questions/37901882/tensorflow-reshaping-a-tensor — Kendall Weihe, Jun 18 '16 at 21:41
...just to make things easier for the next reader: @olivier-moindrot 's link : https://stackoverflow.com/a/37294185/2184122 is important. (apparently) reshaping is no longer needed. — Robert Lugg, Jun 15 '18 at 16:54

Tensorflow accuracy at .99 but predictions awful

1 Answers1

Errors in the code

Wrong model

Recommendations

Linked