0

Problem

I'm trying to classify some 64x64 images as a black box exercise. The NN I have written doesn't change my weights. First time writing something like this, the same code, but on MNIST letters input works just fine, but on this code it does not train like it should:

import tensorflow as tf
import numpy as np


path = ""

# x is a holder for the 64x64 image
x = tf.placeholder(tf.float32, shape=[None, 4096])

# y_ is a 1 element vector, containing the predicted probability of the label
y_ = tf.placeholder(tf.float32, [None, 1])

# define weights and balances
W = tf.Variable(tf.zeros([4096, 1]))
b = tf.Variable(tf.zeros([1]))

# define our model
y = tf.nn.softmax(tf.matmul(x, W) + b)

# loss is cross entropy
cross_entropy = tf.reduce_mean(
                tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

# each training step in gradient decent we want to minimize cross entropy
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

train_labels = np.reshape(np.genfromtxt(path + "train_labels.csv", delimiter=',', skip_header=1), (14999, 1))
train_data = np.genfromtxt(path + "train_samples.csv", delimiter=',', skip_header=1)

# perform 150 training steps with each taking 100 train data
for i in range(0, 15000, 100):
    sess.run(train_step, feed_dict={x: train_data[i:i+100], y_: train_labels[i:i+100]})
    if i % 500 == 0:
        print(sess.run(cross_entropy, feed_dict={x: train_data[i:i+100], y_: train_labels[i:i+100]}))
        print(sess.run(b), sess.run(W))

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


sess.close()

How do I solve this problem?

Emma
  • 27,428
  • 11
  • 44
  • 69
Samuel
  • 53
  • 7

2 Answers2

0

All your initial weights are zeros. When you have that way, the NN doesn't learn well. You need to initialize all the initial weights with random values.

See Why should weights of Neural Networks be initialized to random numbers?

"Why Not Set Weights to Zero? We can use the same set of weights each time we train the network; for example, you could use the values of 0.0 for all weights.

In this case, the equations of the learning algorithm would fail to make any changes to the network weights, and the model will be stuck. It is important to note that the bias weight in each neuron is set to zero by default, not a small random value. "

See https://machinelearningmastery.com/why-initialize-a-neural-network-with-random-weights/

0

The key to the problem is that the class number of you output y_ and y is 1.You should adopt one-hot mode when you use tf.nn.softmax_cross_entropy_with_logits on classification problems in tensorflow. tf.nn.softmax_cross_entropy_with_logits will first compute tf.nn.softmax. When your class number is 1, your results are all the same. For example:

import tensorflow as tf

y = tf.constant([[1],[0],[1]],dtype=tf.float32)
y_ = tf.constant([[1],[2],[3]],dtype=tf.float32)

softmax_var = tf.nn.softmax(logits=y_)
cross_entropy = tf.multiply(y, tf.log(softmax_var))

errors = tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)

with tf.Session() as sess:
    print(sess.run(softmax_var))
    print(sess.run(cross_entropy))
    print(sess.run(errors))

[[1.]
 [1.]
 [1.]]
[[0.]
 [0.]
 [0.]]
[0. 0. 0.]

This means that no matter what your output y_, your loss will be zero. So your weights and bias haven't been updated.

The solution is to modify the class number of y_ and y.

I suppose your class number is n.

First approch:You can change data to one-hot before feed data.Then use the following code.

y_ = tf.placeholder(tf.float32, [None, n])
W = tf.Variable(tf.zeros([4096, n]))
b = tf.Variable(tf.zeros([n]))

Second approch:change data to one-hot after feed data.

y_ = tf.placeholder(tf.int32, [None, 1])
y_ = tf.one_hot(y_,n) # your dtype of y_ need to be tf.int32
W = tf.Variable(tf.zeros([4096, n]))
b = tf.Variable(tf.zeros([n]))
giser_yugang
  • 6,058
  • 4
  • 21
  • 44
  • How do I modify the class? Or how do I use the one hot approach? – Samuel Mar 18 '19 at 07:35
  • @Samuel I have added it to the answer. – giser_yugang Mar 18 '19 at 07:47
  • If I do the first then it can't read the data since the labels have (?, 1) shape (forgot to mention in the labels.csv files there is one label on each line) and y_ now is (?, 8) If I do the second, it says "Cannot feed value of shape (100, 1) for Tensor 'one_hot:0', which has shape '(?, 8, 8)' " – Samuel Mar 18 '19 at 08:35
  • @Samuel You need change data to (?,8) before feed data and after read files when you use the first method. You need keep the shape of labels is (?,1) when you use second method. – giser_yugang Mar 18 '19 at 08:40
  • I can't figure out how to code this...and what the thinking process is – Samuel Mar 18 '19 at 10:09
  • @Samuel You can use the second method directly. You don't need to modify the input data shape . You just need to replace my code with yours. – giser_yugang Mar 18 '19 at 10:45
  • @Samuel Here's a similar question about `softmax`. You may take a look at https://stackoverflow.com/questions/55251319/tensorflow-initialization-gives-all-ones/55253250#55253250 . – giser_yugang Mar 20 '19 at 03:46