25

I have implemented a simple neural network framework which only supports multi-layer perceptrons and simple backpropagation. It works okay-ish for linear classification, and the usual XOR problem, but for sine function approximation the results are not that satisfying.

I'm basically trying to approximate one period of the sine function with one hidden layer consisting of 6-10 neurons. The network uses hyperbolic tangent as an activation function for the hidden layer and a linear function for the output. The result remains a quite rough estimate of the sine wave and takes long to calculate.

I looked at encog for reference and but even with that I fail to get it work with simple backpropagation (by switching to resilient propagation it starts to get better but is still way worse than the super slick R script provided in this similar question). So am I actually trying to do something that's not possible? Is it not possible to approximate sine with simple backpropagation (no momentum, no dynamic learning rate)? What is the actual method used by the neural network library in R?

EDIT: I know that it is definitely possible to find a good-enough approximation even with simple backpropagation (if you are incredibly lucky with your initial weights) but I actually was more interested to know if this is a feasible approach. The R script I linked to just seems to converge so incredibly fast and robustly (in 40 epochs with only few learning samples) compared to my implementation or even encog's resilient propagation. I'm just wondering if there's something I can do to improve my backpropagation algorithm to get that same performance or do I have to look into some more advanced learning method?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Muton
  • 761
  • 2
  • 7
  • 16
  • 1
    Did you ever get it to work? Facing the same problem. – Lex Podgorny Oct 29 '16 at 14:43
  • Don't think so but can't really recall all the details anymore as this was 4 years ago. The nnet package mentioned above is implemented in C and is only 700 lines of code and then some R wrapping on top of it. Perhaps looking into that will give you some ideas. – Muton Oct 31 '16 at 09:40
  • I am implementing this in C/C++. My Network uses 6 neurons in the single hidden layer and uses tanh activation in the hidden layer and linear activation in output layer. It converges in 1600 epochs, without using momentum or optimizer, is that what you are asking for, or is 40 epochs the benchmark for your target? Although, I am working on adding optimizer and momentum to my network, I'll share it here. Should I share my method, till now? – Pe Dro Dec 15 '19 at 03:08
  • If you have a well performing model, and can provide a clear and concise answer, please share your work and I will accept the answer. – Muton Dec 18 '19 at 16:18

3 Answers3

11

This can be rather easily implemented using modern frameworks for neural networks like TensorFlow.

For example, a two-layer neural network using 100 neurons per layer trains in a few seconds on my computer and gives a good approximation:

enter image description here

The code is also quite simple:

import tensorflow as tf
import numpy as np

with tf.name_scope('placeholders'):
    x = tf.placeholder('float', [None, 1])
    y = tf.placeholder('float', [None, 1])

with tf.name_scope('neural_network'):
    x1 = tf.contrib.layers.fully_connected(x, 100)
    x2 = tf.contrib.layers.fully_connected(x1, 100)
    result = tf.contrib.layers.fully_connected(x2, 1,
                                               activation_fn=None)

    loss = tf.nn.l2_loss(result - y)

with tf.name_scope('optimizer'):
    train_op = tf.train.AdamOptimizer().minimize(loss)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # Train the network
    for i in range(10000):
        xpts = np.random.rand(100) * 10
        ypts = np.sin(xpts)

        _, loss_result = sess.run([train_op, loss],
                                  feed_dict={x: xpts[:, None],
                                             y: ypts[:, None]})

        print('iteration {}, loss={}'.format(i, loss_result))
Jonas Adler
  • 10,365
  • 5
  • 46
  • 73
  • 4
    Your code actually implements a 3-layer neural network, not a 2-layer. The naming scheme includes the hidden layers and the output layer, so your three layers are `x1`, `x2`, and `result`. – stackoverflowuser2010 Nov 25 '17 at 22:24
5

You're definitely not trying for the impossible. Neural networks are universal approximators - meaning that for any function F and error E, there exists some neural network (needing only a single hidden layer) that can approximate F with error less than E.

Of course, finding that (those) network(s) is a completely different matter. And the best I can tell you is trial and error... Here's the basic procedure:

  1. Split your data into two parts: a training set (~2/3) and a testing set (~1/3).
  2. Train your network on all of the items in the training set.
  3. Test (but don't train) your network on all the items in the testing set and record the average error.
  4. Repeat steps 2 and 3 until you've reached a minimum testing error (this happens with "overfitting" when your network starts to get super good at the training data to the detriment of everything else) or until your overall error ceases to notably decrease (implying the network's as good as it's going to get).
  5. If the error at this point is acceptably low, you're done. If not, your network isn't complex enough to handle the function you're training it for; add more hidden neurons and go back to the beginning...

Sometimes changing your activation function can make a difference, too (just don't use linear, as it negates the power of adding more layers). But again, it'll be trial and error to see what works best.

Hope that helps (and sorry I can't be more useful)!

PS: I also know it's possible since I've seen someone approximate sine with a network. I want to say she wasn't using a sigmoid activation function, but I can't guarantee my memory on that count...

Xavier Holt
  • 14,471
  • 4
  • 43
  • 56
  • Thanks! This is actually what I'm already doing and I'm sorry if I was a bit unclear. I know it's *possible*, but was more trying to find out if the simple learning method I use is *feasible* for this particular problem? – Muton Dec 16 '12 at 13:04
  • @Muton - Gotcha. My only tip in that case would be to add a momentum term to your current setup. Should help on two fronts: It'll speed up learning a little and allow you to escape from some local minima. I can't imagine it would make a huge performance difference, though. – Xavier Holt Dec 16 '12 at 13:47
1

A similar implementation with sklearn.neural_network:

from sklearn.neural_network import MLPRegressor
import numpy as np

f = lambda x: [[x_] for x_ in x]
noise_level = 0.1
X_train_ = np.arange(0, 10, 0.2)
real_sin = np.sin(X_train_)
y_train = real_sin+np.random.normal(0,noise_level,len(X_train_))    
N = 100
regr = MLPRegressor(hidden_layer_sizes= tuple([N]*5)).fit(f(X_train_), y_train)
predicted_sin = regr.predict(f(X_train_))

The result looks something like this: predicted sin with NN

Sergey
  • 487
  • 3
  • 7
  • Do you know why varying the value of the _periods_ (the `10` in the `np.arange(0, 10, 0.2)`) can harm the estimation?, I made a test varying this value and I obtained [this](https://i.stack.imgur.com/OTf6z.png). – Hans Mar 02 '22 at 17:38
  • wich value did you use? The general point is that you need to have several harmonics to train the model – Sergey Mar 04 '22 at 17:36
  • Hi @Sergey, I decided to follow the discussion in [AI StackExchange](https://ai.stackexchange.com/questions/34695/how-to-make-a-proper-approximation-of-sine-function-with-neural-networks/34697#34697), please let me know what you think :) – Hans Mar 05 '22 at 00:36
  • Hi @Hans, Neil Slater answers are reasonable in my opinion. I made a maximally simple working example, which, on the other hand, is quite meaningless, since the task of approximating a periodic function can be solved using simpler methods (without neural networks at all). Regarding the proposed method, I would note that the result depends on a combination of the number of harmonics, the number of samples per harmonic, the number of neurons, and the hyperparameter values. With certain combinations, the model can give a poor result, which is not surprising. – Sergey Mar 09 '22 at 16:39
  • Unfortunately, a stable combination of parameters can only be found by exhaustive search. But, as I noted earlier, if you need a sustainable solution, then in general it is worth using other methods. – Sergey Mar 09 '22 at 16:39