2

Out of curiosity, I am trying to build a simple fully connected NN using tensorflow to learn a square wave function such as the following one: Credits to www.thedawstudio.com

Therefore the input is a 1D array of x value (as the horizontal axis), and the output is a binary scalar value. I used tf.nn.sparse_softmax_cross_entropy_with_logits as loss function, and tf.nn.relu as activation. There are 3 hidden layers (100*100*100) and a single input node and output node. The input data are generated to match the above wave shape and therefore the data size is not a problem.

However, the trained model seems to fail completed, predicting for the negative class always.

So I am trying to figure out why this happened. Whether the NN configuration is suboptimal, or it is due to some mathematical flaw in NN beneath the surface (though I think NN should be able to imitate any function).

Thanks.


As per suggestions in the comment section, here is the full code. One thing I noticed saying wrong earlier is, there were actually 2 output nodes (due to 2 output classes):

"""
    See if neural net can find piecewise linear correlation in the data
"""

import time
import os
import tensorflow as tf
import numpy as np

def generate_placeholder(batch_size):
    x_placeholder = tf.placeholder(tf.float32, shape=(batch_size, 1))
    y_placeholder = tf.placeholder(tf.float32, shape=(batch_size))
    return x_placeholder, y_placeholder

def feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, loop):
    x_selected = [[None]] * batch_size
    y_selected = [None] * batch_size
    for i in range(batch_size):
        x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0]
        y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i]
    feed_dict = {x_placeholder: x_selected,
                 y_placeholder: y_selected}
    return feed_dict

def inference(input_x, H1_units, H2_units, H3_units):

    with tf.name_scope('H1'):
        weights = tf.Variable(tf.truncated_normal([1, H1_units], stddev=1.0/2), name='weights') 
        biases = tf.Variable(tf.zeros([H1_units]), name='biases')
        a1 = tf.nn.relu(tf.matmul(input_x, weights) + biases)

    with tf.name_scope('H2'):
        weights = tf.Variable(tf.truncated_normal([H1_units, H2_units], stddev=1.0/H1_units), name='weights') 
        biases = tf.Variable(tf.zeros([H2_units]), name='biases')
        a2 = tf.nn.relu(tf.matmul(a1, weights) + biases)

    with tf.name_scope('H3'):
        weights = tf.Variable(tf.truncated_normal([H2_units, H3_units], stddev=1.0/H2_units), name='weights') 
        biases = tf.Variable(tf.zeros([H3_units]), name='biases')
        a3 = tf.nn.relu(tf.matmul(a2, weights) + biases)

    with tf.name_scope('softmax_linear'):
        weights = tf.Variable(tf.truncated_normal([H3_units, 2], stddev=1.0/np.sqrt(H3_units)), name='weights') 
        biases = tf.Variable(tf.zeros([2]), name='biases')
        logits = tf.matmul(a3, weights) + biases

    return logits

def loss(logits, labels):
    labels = tf.to_int32(labels)
    cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')
    return tf.reduce_mean(cross_entropy, name='xentropy_mean')

def inspect_y(labels):
    return tf.reduce_sum(tf.cast(labels, tf.int32))

def training(loss, learning_rate):
    tf.summary.scalar('lost', loss)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    global_step = tf.Variable(0, name='global_step', trainable=False)
    train_op = optimizer.minimize(loss, global_step=global_step)
    return train_op

def evaluation(logits, labels):
    labels = tf.to_int32(labels)
    correct = tf.nn.in_top_k(logits, labels, 1)
    return tf.reduce_sum(tf.cast(correct, tf.int32))

def run_training(x, y, batch_size):
    with tf.Graph().as_default():
        x_placeholder, y_placeholder = generate_placeholder(batch_size)
        logits = inference(x_placeholder, 100, 100, 100)
        Loss = loss(logits, y_placeholder)
        y_sum = inspect_y(y_placeholder)
        train_op = training(Loss, 0.01)
        init = tf.global_variables_initializer()
        sess = tf.Session()
        sess.run(init)
        max_steps = 10000
        for step in range(max_steps):
            start_time = time.time()
            feed_dict = feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, step)
            _, loss_val = sess.run([train_op, Loss], feed_dict = feed_dict)
            duration = time.time() - start_time
            if step % 100 == 0:
                print('Step {}: loss = {:.2f} {:.3f}sec'.format(step, loss_val, duration))
    x_test = np.array(range(1000)) * 0.001
    x_test = np.reshape(x_test, (1000, 1))
    _ = sess.run(logits, feed_dict={x_placeholder: x_test})
    print(min(_[:, 0]), max(_[:, 0]), min(_[:, 1]), max(_[:, 1]))
    print(_)

if __name__ == '__main__':

    population = 10000

    input_x = np.random.rand(population)
    input_y = np.copy(input_x)

    for bin in range(10):
        print(bin, bin/10, 0.5 - 0.5*(-1)**bin)
        input_y[input_x >= bin/10] = 0.5 - 0.5*(-1)**bin

    batch_size = 1000

    input_x = np.reshape(input_x, (population, 1))

    run_training(input_x, input_y, batch_size)

Sample output shows that the model always prefer the first class over the second, as shown by min(_[:, 0]) > max(_[:, 1]), i.e. the minimum logit output for the first class is higher than the maximum logit output for the second class, for a sample size of population.


My mistake. The problem occurred in the line:

for i in range(batch_size):
    x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0]
    y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i]

Python is mutating the whole list of x_selected to the same value. Now this code issue is resolved. The fix is:

x_selected = np.zeros((batch_size, 1))
y_selected = np.zeros((batch_size,))
for i in range(batch_size):
    x_selected[i, 0] = x[(loop*batch_size + i) % x.shape[0], 0]
    y_selected[i] = y[(loop*batch_size + i) % y.shape[0]]

After this fix, the model is showing more variation. It currently outputs class 0 for x <= 0.5 and class 1 for x > 0.5. But this is still far from ideal.


So after changing the network configuration to 100 nodes * 4 layers, after 1 million training steps (batch size = 100, sample size = 10 million), the model is performing very well showing only errors at the edges when y flips. Therefore this question is closed.

wenduowang
  • 37
  • 9
  • What exactly do you mean with "predicting for the negative class always"? You mean your output is always negative? And why don't you use as input the entire lineshape (let's say a period)? For example 100 points as input and you try to get 100 points as output? – Umberto Sep 14 '17 at 06:21
  • Can you post your network architecture code? – nessuno Sep 14 '17 at 09:03

2 Answers2

3

You essentially try to learn a periodic function and the function is highly non-linear and non-smooth. So it is NOT simple as it looks like. In short, a better representation of the input feature helps.

Suppose your have a period T = 2, f(x) = f(x+2). For a reduced problem when input/output are integers, your function is then f(x) = 1 if x is odd else -1. In this case, your problem would be reduced to this discussion in which we train a Neural Network to distinguish between odd and even numbers.

I guess the second bullet in that post should help (even for the general case when inputs are float numbers).

Try representing the numbers in binary using a fixed length precision.

In our reduced problem above, it's easy to see that the output is determined iff the least-significant bit is known.

decimal  binary  -> output
1:       0 0 1   -> 1
2:       0 1 0   -> -1
3:       0 1 1   -> 1
...
greeness
  • 15,956
  • 5
  • 50
  • 80
  • when your wave function crosses 0 your target value is -1, 0 or 1 ? do you see paradox here ? – klubow Sep 14 '17 at 05:47
  • i think we should make a choice for the general problem when inputs are float numbers, something like this: https://i.stack.imgur.com/u2eJL.png – greeness Sep 14 '17 at 05:54
  • So for an unknown problem, what should I do to generalize this approach to enable NN to detect such patterns.For example a geometric series, where x = {1, 2, 4, 8, ...} and y = {1 if x >= 2^n & x < 2^(n+1) else 0; n = {0, 2, 4, ...}} – wenduowang Sep 14 '17 at 15:15
  • I have no idea for a general approach. Fortunately, in practice, neural network works well. Does your updated code and result also show that, as long as you define a finite horizon, the neural networks could work decently? – greeness Sep 14 '17 at 16:43
  • I tried a deeper network with 4 layers and 100 nodes each. Now the result looks better with some nonlinear behaviours. I believe as far as there are more data and more time (all comes down to time) the performance will get decent. Thanks! – wenduowang Sep 14 '17 at 17:35
0

I created the model and the structure for the problem of recognizing odd/even numbers in here.

If you abstract the fact that:

decimal  binary  -> output
1:       0 0 1   -> 1
2:       0 1 0   -> -1
3:       0 1 1   -> 1

Is almost equivalent to:

decimal  binary  -> output
1:       0 0 1   -> 1
2:       0 1 0   -> 0
3:       0 1 1   -> 1

You may update the code to fit your need.

prosti
  • 42,291
  • 14
  • 186
  • 151