Recurrent NN for prediction doesn't learn

Question

I'm trying to build a recurrent neural network for prediction. I'm doing it in PyBrain.

I've created two simple scripts to test the ideas and techniques before moving on to implementing them to something more complex.

I've tried to follow the code that is proven to work as much as I can, that is: On stackoverflow, and on github.

In the first example I'm trying to predict sin values given a timeframe of past values:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""An example of a simple RNN."""

import time
import math
import matplotlib.pyplot as plt

from normalizator import Normalizator

from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure.modules import LSTMLayer
from pybrain.structure import LinearLayer, SigmoidLayer
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.supervised import RPropMinusTrainer
from pybrain.datasets import SupervisedDataSet
from pybrain.datasets import SequentialDataSet
import pybrain.datasets.sequential


class Network(object):
    """Sieć neuronowa."""

    def __init__(self, inputs, hidden, outputs):
        """Just a constructor."""
        self.inputs = inputs
        self.outputs = outputs
        self.hidden = hidden
        self.network = self.build_network(inputs, hidden, outputs)
        self.norm = Normalizator()

    def build_network(self, inputs, hidden, outputs):
        """Builds the network."""
        network = buildNetwork(inputs, hidden, outputs,
                               hiddenclass=LSTMLayer,
                               #hiddenclass=SigmoidLayer,
                               outclass=SigmoidLayer,
                               bias = True,
                               outputbias=False, recurrent=True)
        network.sortModules()
        print "Constructed network:"
        print network
        return network

    def train(self, learning_set, max_terations=100):
        """Trains the network."""
        print "\nThe network is learning..."
        time_s = time.time()
        self.network.randomize()
        #trainer = RPropMinusTrainer(self.network, dataset=learning_set,
        #                            verbose=True)
        learning_rate = 0.05
        trainer = BackpropTrainer(self.network, learning_set, verbose=True,
                                  momentum=0.8, learningrate=learning_rate)
        errors = trainer.trainUntilConvergence(maxEpochs=max_terations)
        #print "Last error in learning:", errors[-1]
        time_d = time.time() - time_s
        print "Learning took %d seconds." % time_d
        return errors, learning_rate

    def test(self, data):
        """Tests the network."""
        print ("X\tCorrect\tOutput\t\tOutDenorm\tError")
        mse = 0.0
        outputs = []
        #self.network.reset()
        for item in data:
            x_val = self.norm.denormalize("x", item[0])
            sin_val = self.norm.denormalize("sin", item[1])
            #get the output from the network
            output = self.network.activate(item[0])[0]
            out_denorm = self.norm.denormalize("sin", output)
            outputs.append(out_denorm)
            #compute the error
            error = sin_val - out_denorm
            mse += error**2
            print "%f\t%f\t%f\t%f\t%f" % \
                (round(x_val, 2), sin_val, output, out_denorm, error)
        mse = mse / float(len(data))
        print "MSE:", mse
        return outputs, mse

    def show_plot(self, correct, outputs, learn_x, test_x,
                  learning_targets, mse):
        """Plots some useful stuff :)"""
        #print "learn_x:", learn_x
        #print "test_x:", test_x
        #print "output:", outputs
        #print "correct:", correct
        fig = plt.figure()
        ax = fig.add_subplot(111)
        ax.plot(test_x, outputs, label="Prediction", color="red")
        ax.plot(test_x, correct, ":", label="Original data")
        ax.legend(loc='upper left')
        plt.xlabel('X')
        plt.ylabel('Sinus')
        plt.title('Sinus... (mse=%f)' % mse)
        #plot a portion of the learning data
        learning_plt = fig.add_subplot(111)
        learn_index = int(0.9 * len(learning_targets))
        learning_plt.plot(learn_x[learn_index:], learning_targets[learn_index:],
                          label="Learning values", color="blue")
        learning_plt.legend(loc='upper left')
        plt.show()

    def prepare_data(self):
        """Prepares the data."""
        learn_inputs = [round(x, 2) for x in [y * 0.05 for y in range(0, 4001)]]
        learn_targets = [math.sin(z) for z in learn_inputs]

        test_inputs = [round(x, 2) for x in [y * 0.05 for y in range(4001, 4101)]]
        test_targets = [math.sin(z) for z in test_inputs]

        self.norm.add_feature("x", learn_inputs + test_inputs)
        self.norm.add_feature("sin", learn_targets + test_targets)

        #learning_set = pybrain.datasets.sequential.SupervisedDataSet(1, 1)
        learning_set = SequentialDataSet(1, 1)
        targ_close_to_zero = 0
        for inp, targ in zip(learn_inputs, learn_targets):
            if abs(targ) < 0.01:
                targ_close_to_zero += 1
            #if inp % 1 == 0.0:
            if targ_close_to_zero == 2:
                print "New sequence at", (inp, targ)
                targ_close_to_zero = 0
                learning_set.newSequence()
            learning_set.appendLinked(self.norm.normalize("x", inp),
                                      self.norm.normalize("sin", targ))

        testing_set = []
        for inp, targ in zip(test_inputs, test_targets):
            testing_set.append([self.norm.normalize("x", inp),
                               self.norm.normalize("sin", targ), inp, targ])
        return learning_set, testing_set, learn_inputs, test_inputs, learn_targets

if __name__ == '__main__':
    nnetwork = Network(1, 20, 1)
    learning_set, testing_set, learning_inputs, testing_inputs, learn_targets = \
        nnetwork.prepare_data()
    errors, rate = nnetwork.train(learning_set, 125)
    outputs, mse = nnetwork.test(testing_set)
    correct = [element[3] for element in testing_set]
    nnetwork.show_plot(correct, outputs,
                       learning_inputs, testing_inputs, learn_targets, mse)

Results are tragic, to say the least.

X       Correct     Output      OutDenorm   Error

200.050000  -0.847857   0.490775    -0.018445   -0.829411
200.100000  -0.820297   0.490774    -0.018448   -0.801849
200.150000  -0.790687   0.490773    -0.018450   -0.772237
200.200000  -0.759100   0.490772    -0.018452   -0.740648
200.250000  -0.725616   0.490770    -0.018454   -0.707162

This is insane.

The second one is similar, based on the sun spots data:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""An example of a simple RNN."""

import argparse
import sys
import operator
import time

from pybrain.tools.shortcuts import buildNetwork
from pybrain.structure import FullConnection
from pybrain.structure.modules import LSTMLayer
from pybrain.structure import LinearLayer, SigmoidLayer
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.supervised import RPropMinusTrainer
from pybrain.datasets import SupervisedDataSet
import pybrain.datasets.sequential

import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter

from normalizator import Normalizator


class Network(object):
    """Neural network."""

    def __init__(self, inputs, hidden, outputs):
        """Constructor."""
        self.inputs = inputs
        self.outputs = outputs
        self.hidden = hidden
        self.network = self.build_network(inputs, hidden, outputs)
        self.norm = Normalizator()

    def build_network(self, inputs, hidden, outputs):
        """Builds the network."""
        network = buildNetwork(inputs, hidden, outputs, bias=True,
                               hiddenclass=LSTMLayer,
                               #hiddenclass=SigmoidLayer,
                               outclass=SigmoidLayer,
                               outputbias=False, fast=False, recurrent=True)
        #network.addRecurrentConnection(
        #    FullConnection(network['hidden0'], network['hidden0'], name='c3'))
        network.sortModules()
        network.randomize()
        print "Constructed network:"
        print network
        return network

    def train(self, learning_set, max_terations=100):
        """Trains the network."""
        print "\nThe network is learning..."
        time_s = time.time()
        trainer = RPropMinusTrainer(self.network, dataset=learning_set,
                                    verbose=True)
        learning_rate = 0.001
        #trainer = BackpropTrainer(self.network, learning_set, verbose=True,
        #          batchlearning=True, momentum=0.8, learningrate=learning_rate)
        errors = trainer.trainUntilConvergence(maxEpochs=max_terations)
        #print "Last error in learning:", errors[-1]
        time_d = time.time() - time_s
        print "Learning took %d seconds." % time_d
        return errors, learning_rate

    def test(self, data):
        """Tests the network."""
        print ("Year\tMonth\tCount\tCount_norm\t" +
                "Output\t\tOutDenorm\tError")
        # do the testing
        mse = 0.0
        outputs = []
        #print "Test data:", data
        for item in data:
            #month = self.norm.denormalize("month", item[1])
            #year = self.norm.denormalize("year", item[2])
            year, month = self.norm.denormalize("ym", item[5])
            count = self.norm.denormalize("count", item[3])
            #get the output from the network
            output = self.network.activate((item[1], item[2]))
            out_denorm = self.norm.denormalize("count", output[0])
            outputs.append(out_denorm)
            #compute the error
            error = count - out_denorm
            mse += error**2
            print "%d\t%d\t%s\t%f\t%f\t%f\t%f" % \
                (year, month, count, item[3],
                 output[0], out_denorm, error)
        mse /= len(data)
        print "MSE:", mse
        #corrects = [self.norm.denormalize("count", item[3]) for item in data]
        #print "corrects:", len(corrects)
        return outputs, mse

    def show_plot(self, correct, outputs, learn_x, test_x,
                  learning_targets, mse):
        """Rysuje wykres :)"""
        #print "x_axis:", x_axis
        #print "output:", output
        #print "correct:", correct
        fig = plt.figure()
        ax = fig.add_subplot(111)
        ax.plot(test_x, outputs, label="Prediction", color="red")
        ax.plot(test_x, correct, ":", label="Correct")
        #                                               int(201000.0 / 100)
        ax.xaxis.set_major_formatter(FormatStrFormatter('%s'))
        ax.legend(loc='upper left')
        learn_index = int(0.8 * len(learn_x))
        learn_part_x = learn_x[learn_index:]
        learn_part_vals = learning_targets[learn_index:]
        learning_plt = fig.add_subplot(111)
        learning_plt.plot(learn_part_x, learn_part_vals,
                          label="Learning values", color="blue")
        learning_plt.legend(loc='upper left')
        plt.xlabel('Year-Month')
        plt.ylabel('Values')
        plt.title('... (mse=%f)' % mse)
        plt.show()

    def read_data(self, learnfile, testfile):
        """Wczytuje dane uczące oraz testowe."""
        #read learning data
        data_learn_tmp = []
        for line in learnfile:
            if line[1] == "#":
                continue
            row = line.split()
            year = float(row[0][0:4])
            month = float(row[0][4:6])
            yearmonth = int(row[0])
            count = float(row[2])
            data_learn_tmp.append([month, year, count, yearmonth])
        data_learn_tmp = sorted(data_learn_tmp, key=operator.itemgetter(1, 0))
        # read test data
        data_test_tmp = []
        for line in testfile:
            if line[0] == "#":
                continue
            row = line.split()
            year = float(row[0][0:4])
            month = float(row[0][4:6])
            count = float(row[2])
            year_month = int(row[0])
            data_test_tmp.append([month, year, count, year_month])
        data_test_tmp = sorted(data_test_tmp, key=operator.itemgetter(1, 0))
        # prepare data for normalization
        months = [item[0] for item in data_learn_tmp + data_test_tmp]
        years = [item[1] for item in data_learn_tmp + data_test_tmp]
        counts = [item[2] for item in data_learn_tmp + data_test_tmp]
        self.norm.add_feature("month", months)
        self.norm.add_feature("year", years)
        ym = [(years[index], months[index]) for index in xrange(0, len(years))]
        self.norm.add_feature("ym", ym, ranked=True)
        self.norm.add_feature("count", counts)
        #build learning data set
        learning_set = pybrain.datasets.sequential.SequentialDataSet(2, 1)
        #learning_set = pybrain.datasets.sequential.SupervisedDataSet(2, 1)
        # add items to the learning dataset proper
        last_year = -1
        for item in data_learn_tmp:
            if last_year != item[1]:
                learning_set.newSequence()
                last_year = item[1]
            year_month = self.norm.normalize("ym", (item[1], item[0]))
            count = self.norm.normalize("count", item[2])
            learning_set.appendLinked((year_month), (count))
        #build testing data set proper
        words = ["N/A"] * len(data_test_tmp)
        testing_set = []
        for index in range(len(data_test_tmp)):
            month = self.norm.normalize("month", data_test_tmp[index][0])
            year = self.norm.normalize("year", data_test_tmp[index][3])
            year_month = self.norm.normalize("ym",
                        (data_test_tmp[index][4], data_test_tmp[index][0]))
            count = self.norm.normalize("count", data_test_tmp[index][5])
            testing_set.append((words[index], month, year,
                                count, data_test_tmp[index][6], year_month))
        #learning_set, testing_set, learn_inputs, test_inputs, learn_targets
        learn_x = [element[3] for element in data_learn_tmp]
        test_x = [element[3] for element in data_test_tmp]
        learn_targets = [element[2] for element in data_learn_tmp]
        test_targets = [element[2] for element in data_test_tmp]
        return (learning_set, testing_set, learn_x, test_x,
                learn_targets, test_targets)


def get_args():
    """Buduje parser cli."""
    parser = argparse.ArgumentParser(
        description='Trains a simple recurrent neural network.')

    parser.add_argument('--inputs', type=int, default=2,
                        help='Number of input neurons.')
    parser.add_argument('--hidden', type=int, default=5,
                        help='Number of hidden neurons.')
    parser.add_argument('--outputs', type=int, default=1,
                        help='Number of output neurons.')

    parser.add_argument('--iterations', type=int, default=100,
                help='Maximum number of iteration epoch in training phase.')

    parser.add_argument('trainfile', nargs='?', type=argparse.FileType('r'),
                        default=sys.stdin, help="File with learning dataset.")
    parser.add_argument('testfile', nargs='?', type=argparse.FileType('r'),
                        default=sys.stdin, help="File with testing dataset.")

    parser.add_argument('--version', action='version', version='%(prog)s 1.0')

    return parser.parse_args()

if __name__ == '__main__':
    args = get_args()
    nnetwork = Network(args.inputs, args.hidden, args.outputs)
    learning_set, testing_set, learn_x, test_x, learn_targets, test_targets = \
        nnetwork.read_data(args.trainfile, args.testfile)
    errors, rate = nnetwork.train(learning_set, args.iterations)
    outputs, mse = nnetwork.test(testing_set)
    nnetwork.show_plot(test_targets, outputs,
                       learn_x, test_x, learn_targets, mse)

And here also, I see only chaos, which I cannot show you on the plot as I don't have enough reputation points. But basically, the prediction function is a cyclical teeth-shaped curve that doesn't correlate much with the inputs or the past data.

Year    Month   Count   Count_norm  Output      OutDenorm   Error
2009    9       4.3     0.016942    0.216687    54.995108   -50.695108
2009    10      4.8     0.018913    0.218810    55.534015   -50.734015
2009    11      4.1     0.016154    0.221876    56.312243   -52.212243
2009    12      10.8    0.042553    0.224774    57.047758   -46.247758
2010    1       13.2    0.052009    0.184361    46.790833   -33.590833
2010    2       18.8    0.074074    0.181018    45.942258   -27.142258
2010    3       15.4    0.060678    0.183226    46.502806   -31.102806

I've tried with two different learning algorithms, many combinations of hidden units, learning rates, types of adding elements into the learning dataset, but to no avail.

I'm completely lost now.

I would kindly suggest to reshape your question to something more specific. Nobody can guarantee to you that a neural network would actually fit for any problem. What is exactly that you are asking? — Pantelis Natsiavas, Nov 23 '13 at 17:49
@PantelisNatsiavas, thanks for the interest in my issue. Of course, nobody can guarantee that a neural network will work for any given problem. However there are papers describing the usage of recurrent neural networks for time series prediction / regression. Simple regression of sin function should therefore not be that much of a problem for RNNs. I'm asking for any hint/idea that may lead me to fixing my problem of NN not converging/learning. — Bartosz, Nov 23 '13 at 18:28
@BartoszKP, the errors in the listings are calculated as follows: expected_value - computed_value. Each of which is a denormalized value. — Bartosz, Nov 23 '13 at 22:02
@Bartosz I asked not about the error measure, but *using what data* the error was calculated. Is the error displayed in the listings a result of calculating the error on the data points you used for training, or the ones used for testing? — BartoszKP, Nov 23 '13 at 22:17
@BartoszKP, the errors were obtained from testing data. Learning is very slow: errors that I see in the learning process are diminishing only by a very small value. — Bartosz, Nov 24 '13 at 10:50
@Bartosz Perhaps try two things (independently, then together): increasing the `max_iterations` limit (`100` is very small number in relation to the amount of training data) and lowering the number of training examples. — BartoszKP, Nov 24 '13 at 11:21
@BartoszKP, increasing the number of iterations did not do much as the errors are not getting that much smaller after about 50-60 iterations. I've tried cuting the number of training examples by 2/3, but it also did not help much. And the combination of the two techniques did not improve the results. I'll try a few more things and report back :) — Bartosz, Nov 24 '13 at 17:05

lennon310 · Answer 1 · 2013-11-26T16:50:43.283

5

If you are using a logistic activation function in the output layer, the output will be restricted to the range (0,1). But your sin function provides the output with the range of (-1,1). I think that's why your sin learning is difficult to converge to a small errors. You cannot even get the correct prediction of sin function in your training data, can you? Perhaps you may need to scale your input/output set before training and testing.

edited Nov 26 '13 at 16:50

answered Nov 26 '13 at 16:41

lennon310

12,503
11
43
61

thanks for the input. Indeed the scales of the network output and sinus funtions are somewhat different. That's why I normalize everything I put into the network to the range [0, 1] and then I denormalize network's output to the original values. I've found that the normal BackPropagation algorithm works much worse than RPropMinusTrainer. Now I work on my "real" data, i.e. the data I really want to predict, not just to ply around with like with sunspots and sinuses. I'm getting better results. It may be the nature of the data + some code clean up. But what do I do with this question?... – Bartosz Nov 30 '13 at 18:12

Recurrent NN for prediction doesn't learn

1 Answers1