4

I am involved with an application that needs to estimate the state of a certain system in real time by measuring a set of (non-linearly) dependent parameters. Up until now the application was using an extended Kalman filter, but it was found to be underperforming in certain circumstances, which is likely caused by the fact that the differences between the real system and its model used in the filter are too significant to be modeled as white noise. We cannot use a more precise model for a number of unrelated reasons.

We decided to try recurrent neural networks for the task. Since my experience with neural networks is quite limited, before tackling the real task itself, I decided to practice with a hand crafted problem first. That problem, however, I could not solve, so I'm asking for help here.

Here's what I did: I generated some sine waveforms of varying phase, frequency, amplitude, and offset. Then I distorted the waveforms with some white noise, and (unsuccessfully) attempted to train an LSTM network to recover my waveforms from the noisy signal. I expected that the network will eventually learn to fit a sine waveform into the noisy data set.

Here's the source (slightly abridged, but it should work):

#!/usr/bin/env python3

import time
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.layers.wrappers import TimeDistributed
from keras.objectives import mean_absolute_error, cosine_proximity

POINTS_PER_WF = int(1e4)
X_SPACE = np.linspace(0, 100, POINTS_PER_WF)

def make_waveform_with_noise():
    def add_noise(vec):
        stdev = float(np.random.uniform(0.01, 0.2))
        return vec + np.random.normal(0, stdev, size=len(vec))

    f = np.random.choice((np.sin, np.cos))
    wf = f(X_SPACE * np.random.normal(scale=5)) *\
         np.random.normal(scale=5) + np.random.normal(scale=50)
    return wf, add_noise(wf)

RESCALING = 1e-3
BATCH_SHAPE = (1, POINTS_PER_WF, 1)

model = Sequential([
    TimeDistributed(Dense(5, activation='tanh'), batch_input_shape=BATCH_SHAPE),
    LSTM(20, activation='tanh', inner_activation='sigmoid', return_sequences=True),
    LSTM(20, activation='tanh', inner_activation='sigmoid', return_sequences=True),
    TimeDistributed(Dense(1, activation='tanh'))
])

def compute_loss(y_true, y_pred):
    skip_first = POINTS_PER_WF // 2
    y_true = y_true[:, skip_first:, :] * RESCALING
    y_pred = y_pred[:, skip_first:, :] * RESCALING
    me = mean_absolute_error(y_true, y_pred)
    cp = cosine_proximity(y_true, y_pred)
    return me + cp

model.summary()
model.compile(optimizer='adam', loss=compute_loss,
              metrics=['mae', 'cosine_proximity'])

NUM_ITERATIONS = 30000

for iteration in range(NUM_ITERATIONS):
    wf, noisy_wf = make_waveform_with_noise()
    y = wf.reshape(BATCH_SHAPE) * RESCALING
    x = noisy_wf.reshape(BATCH_SHAPE) * RESCALING
    info = model.train_on_batch(x, y)

model.save_weights('final.hdf5')

The first dense layer is actually useless, the reason I added it is because I wanted to make sure I can successfully combine LSTM and time distributed dense layers, since my real application will likely need that setup.

The error function was modified a number of times. Initially I was using plain mean squared error, but the training process was extremely slow, and it was mostly converging to simply copying the input noisy signal into the output. The cosine proximity metric I added later essentially defines the degree of similarity between the shapes of the functions; it seemed to speed up the learning quite a bit. Also note that I'm applying the loss function only to the last half of the dataset; the motivation for that is that I expected that the network will need to see a few periods of the signal in order to be able to correctly identify the parameters of the waveform. However, I found that this modification has no visible effect on the performance of the network.

The latest modification of the script uses Adam optimizer, I also experimented with RMSProp with varying learning rate and decay settings, but I found no noticeable difference in behavior of the network.

I am using Theano 0.9 (dev) backend configured to use 64 bit floating point, in order to prevent possible issues with numerical stability. The epsilon value is set accordingly to 1e-14.

This is what the output looks like after 15k..30k training steps (performance stops improving starting from about 15k steps) (the first plot is zoomed in for the sake of clarity):

enter image description here enter image description here

Plot legend:

  • blue (0) - noisy signal, input of the RNN
  • green (1) - recovered signal, output of the RNN
  • red (2) - ground truth

My question is: what am I doing wrong?

Pavel Kirienko
  • 1,162
  • 1
  • 15
  • 31
  • 2
    Not sure, but I think you need a lot more parameters (more layers, more neurons/cells) to get that job done. Also, a Conv1D layer maybe much better than dense layers. -- This paper may also help, although I wasn't able to fully understand it yet: https://arxiv.org/abs/1609.03499 – Daniel Möller May 19 '17 at 14:08

0 Answers0