2

I'm taking my first steps with deep learnig and tensorflow. Therefore, I have some questions.

According to the Tutorial and the Getting Started I created a DNN with hidden layer as well as some easy softmax modell. I used the dataset from https://archive.ics.uci.edu/ml/datasets/wine and split it up into train and test dataset.

from __future__ import print_function
import tensorflow as tf


num_attributes = 13
num_types = 3


def read_from_cvs(filename_queue):
    reader = tf.TextLineReader()
    key, value = reader.read(filename_queue)
    record_defaults = [[] for col in range(
        num_attributes + 1)]
    attributes = tf.decode_csv(value, record_defaults=record_defaults)
    features = tf.stack(attributes[1:], name="features")
    labels = tf.one_hot(tf.cast(tf.stack(attributes[0], name="labels"), tf.uint8), num_types + 1, name="labels-onehot")
    return features, labels


def input_pipeline(filename='wine_train.csv', batch_size=30, num_epochs=None):
    filename_queue = tf.train.string_input_producer([filename], num_epochs=num_epochs, shuffle=True)
    features, labels = read_from_cvs(filename_queue)

    min_after_dequeue = 2 * batch_size
    capacity = min_after_dequeue + 3 * batch_size
    feature_batch, label_batch = tf.train.shuffle_batch(
        [features, labels], batch_size=batch_size, capacity=capacity,
        min_after_dequeue=min_after_dequeue)
    return feature_batch, label_batch


def train_and_test(hidden1, hidden2, learning_rate, epochs, train_batch_size, test_batch_size, test_interval):
    examples_train, labels_train = input_pipeline(filename="wine_train.csv", batch_size=train_batch_size)
    examples_test, labels_test = input_pipeline(filename="wine_train.csv", batch_size=test_batch_size)

    with tf.name_scope("first layer"):
        x = tf.placeholder(tf.float32, [None, num_attributes], name="input")
        weights1 = tf.Variable(
            tf.random_normal(shape=[num_attributes, hidden1], stddev=0.1), name="weights")
        bias = tf.Variable(tf.constant(0.0, shape=[hidden1]), name="bias")
        activation = tf.nn.relu(
            tf.matmul(x, weights1) + bias, name="relu_act")
        tf.summary.histogram("first_activation", activation)

    with tf.name_scope("second_layer"):
        weights2 = tf.Variable(
            tf.random_normal(shape=[hidden1, hidden2], stddev=0.1),
            name="weights")
        bias2 = tf.Variable(tf.constant(0.0, shape=[hidden2]), name="bias")
        activation2 = tf.nn.relu(
            tf.matmul(activation, weights2) + bias2, name="relu_act")
        tf.summary.histogram("second_activation", activation2)

    with tf.name_scope("output_layer"):
        weights3 = tf.Variable(
            tf.random_normal(shape=[hidden2, num_types + 1], stddev=0.5), name="weights")
        bias3 = tf.Variable(tf.constant(1.0, shape=[num_types+1]), name="bias")
        output = tf.add(
            tf.matmul(activation2, weights3, name="mul"), bias3, name="output")
        tf.summary.histogram("output_activation", output)

    y_ = tf.placeholder(tf.float32, [None, num_types+1])

    with tf.name_scope("loss"):
        cross_entropy = tf.reduce_mean(
            tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=output))
        tf.summary.scalar("cross_entropy", cross_entropy)
    with tf.name_scope("train"):
        train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

    with tf.name_scope("tests"):
        correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y_, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
        tf.summary.scalar("accuracy", accuracy)

    summary_op = tf.summary.merge_all()
    sess = tf.InteractiveSession()
    writer = tf.summary.FileWriter("./wineDnnLow", sess.graph)
    tf.global_variables_initializer().run()
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord, sess=sess)


    try:
        step = 0
        while not coord.should_stop() and step < epochs:
            #  train
            ex, lab = sess.run([examples_train, labels_train])
            _ = sess.run([train_step], feed_dict={x: ex, y_: lab})
            #  test
            if step % test_interval == 0:
                ex, lab = sess.run([examples_test, labels_test])
                summery, test_accuracy = sess.run([summary_op, accuracy], feed_dict={x: ex, y_: lab})
                writer.add_summary(summery, step)
                print("accurary= {0:f} on {}".format(test_accuracy, step))
            step += 1
    except tf.errors.OutOfRangeError:
        print("Done training for %d steps" % (step))

    coord.request_stop()
    coord.join(threads)
    sess.close()



def main():
    train_and_test(10, 20, 0.5, 700, 30, 10, 1)


if __name__ == '__main__':
    main()

The problem is, that accurary does not converge and seems to get random values. But when I try tf.contrib.learn.DNNClassifier my data gets classified pretty well. So can anyone give me some hint where the problem is on my self created DNN?

Moreover, I have a second question. On training I provide train_step on session.run() and on testing not. Does this ensure, that the weights are not influenced and so the graph is not learning by testing?

Edit: If I use the MNIST dataset and its batch handling insteat of mine the net behaves well. Therefore, I think the problem is caused by input_pipeline.

user98765
  • 23
  • 3
  • decrease learning rate, decrease stddev in all layers. In general - how did you came up with all this constants? It seems like you provided random initializer values in each variable. – lejlot Aug 29 '17 at 22:59
  • I tried different learning rates but the problem is still the same. Moreover, if I use the MNIST dataset with its batch handling the net works fine. Therefore, I think it should be caused somehow by my input_pipeline – user98765 Aug 29 '17 at 23:02

1 Answers1

1

A quick glance at the dataset indicates to me the first thing I'd do is normalize it (subtract mean, divide by standard deviation). That said, it's still a very small dataset compared to MNIST, so don't expect everything to work exactly the same.

If you're unsure of your input pipeline, just load all the data into memory rather than using your input pipeline.

A few general notes:

  1. Your input pipeline isn't saving you any time. Your dataset is small, so I'd just use a feed_dict, but if it was massive you'd be better off removing the placeholders and just using the output of the input_pipeline (and building a separate graph for testing).
  2. Use the tf.layers API for common layers types. For example, your inference section can be effectively reduced with the following three lines.

    activation = tf.layers.dense(x, hidden1, activation=tf.nn.relu)
    activation2 = tf.layers.dense(x, hidden2, activation=tf.nn.relu)
    output = tf.layers.dense(activation2, num_types+1)
    

(You won't have the same initialization, but you can specify those with optional arguments. The defaults are a good place to start though.)

  1. GradientDescentOptimizer is very primitive. My current favourite is AdamOptimizer, but experiment with others. If that looks too complex for you, MomentumOptimizer generally gives a good trade-off between complexity and performance benefits.
  2. Check out the tf.estimator.Estimator API. It'll make a lot of what you're doing much easier and force you to separate data loading from the model itself (a good thing).

  3. Check out the tf.contrib.data.Dataset API for data preprocessing. Queues have been around for a while in tensorflow so that's what most of the tutorials are written for, but the Dataset API is much more intuitive/easier in my opinion. Again, it's a bit overkill for this situation where you can load all data into memory easily. See this question for how to use a Dataset starting from a CSV file.

DomJack
  • 4,098
  • 1
  • 17
  • 32
  • Thanks. Just to make it clear, I use the overkilled input_pipeline on this small dataset because later I want to work with bigger datasets but thought it would be easier to learn on a small one but use "the proper" methods. – user98765 Aug 30 '17 at 07:40
  • Commendable - but imo get the simplest thing working first, then elaborate :). Bonus marks if you go and convert to `tfrecords` rather than parse each csv record every time you run through it in your dataset. Whatever you use (csv, tfrecords), you shouldn't be doing 2 session runs for each train step (1 to get the data, 1 to feed it to the main graph) - you should just link the two to avoid uneccessarily shipping data around the place. – DomJack Aug 30 '17 at 07:44
  • To avoid doing 2 session runs per train step I have to remove placeholders to feed the tensors directly? "and building a separate graph for testing" how do I get an extra graph with me train state? Do I have to use tf.train.Saver to save and restore it or is there other method? – user98765 Aug 30 '17 at 08:54
  • Do training and testing in seperate scripts, one graph for each. They should obviously both call the same graph construction function with inputs from ether training or testing pipelines – DomJack Aug 30 '17 at 12:18