Is such a normalization right for wiggly curves?

Question

I am training a neural network (in C++, without any additional library), to learn a random wiggly function:

f(x)=0.2+0.4x2+0.3sin(15x)+0.05cos(50x)

Plotted in Python as:

lim = 500

for i in range(lim):
  x.append(i)
  p = 2*3.14*i/lim
  y.append(0.2+0.4*(p*p)+0.3*p*math.sin(15*p)+0.05*math.cos(50*p))

plt.plot(x,y)

that corresponds to a curve as :

The same neural network has successfully approximated the sine function quite well with a single hidden layer(5 neurons), tanh activation. But, I am unable to understand what's going wrong with the wiggly function. Although the Mean Square Error seems to dip.(**The error has been scaled up by 100 for visibility):

And this is the expected (GREEN) vs predicted (RED) graph.

I doubt the normalization. This is how I did it:

Generated training data as:

int numTrainingSets = 100;
double MAXX = -9999999999999999;

for (int i = 0; i < numTrainingSets; i++)
    {
        double p = (2*PI*(double)i/numTrainingSets);
        training_inputs[i][0] = p;  //INSERTING DATA INTO i'th EXAMPLE, 0th INPUT (Single input)
        training_outputs[i][0] = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p); //Single output

        ///FINDING NORMALIZING FACTOR (IN INPUT AND OUTPUT DATA)
        for(int m=0; m<numInputs; ++m)
            if(MAXX < training_inputs[i][m])
                MAXX = training_inputs[i][m];   //FINDING MAXIMUM VALUE IN INPUT DATA
        for(int m=0; m<numOutputs; ++m)
            if(MAXX < training_outputs[i][m])
                MAXX = training_outputs[i][m];  //FINDING MAXIMUM VALUE IN OUTPUT DATA

        ///NORMALIZE BOTH INPUT & OUTPUT DATA USING THIS MAXIMUM VALUE 
        ////DO THIS FOR INPUT TRAINING DATA
        for(int m=0; m<numInputs; ++m)
            training_inputs[i][m] /= MAXX;
        ////DO THIS FOR OUTPUT TRAINING DATA
        for(int m=0; m<numOutputs; ++m)
            training_outputs[i][m] /= MAXX;
    }

This is what the model trains on. The validation/test data is generated as follows:

int numTestSets = 500;
    for (int i = 0; i < numTestSets; i++)
    {
        //NORMALIZING TEST DATA USING THE SAME "MAXX" VALUE 
        double p = (2*PI*i/numTestSets)/MAXX;
        x.push_back(p);     //FORMS THE X-AXIS FOR PLOTTING

        ///Actual Result
        double res = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p);
        y1.push_back(res);  //FORMS THE GREEN CURVE FOR PLOTTING

        ///Predicted Value
        double temp[1];
        temp[0] = p;
        y2.push_back(MAXX*predict(temp));  //FORMS THE RED CURVE FOR PLOTTING, scaled up to de-normalize 
    }

Is this normalizing right? If yes, what could probably go wrong? If no, what should be done?

That’s not a graph of that function (absent, perhaps, some creative placement of parentheses and use of degrees). — Davis Herring, Nov 27 '19 at 01:52
Hello. If you are referrring to the GREEN curve not matching with the wiggly curve (BLUE) shown above, you are mistaken. They are both the same curves, its just that the GREEN one has been plotted for a smaller range of x-axis data. — Pe Dro, Nov 27 '19 at 02:10
Yes, the function used is: p = 2*3.14*i/500 , 0.2+0.4*(p^2)+0.3*p*math.sin(15*p)+0.05*math.cos(50*p) , where 500 is the number of data points. Look Here: https://colab.research.google.com/drive/1taV2Yna6bBiRaLaT1EeSZJA8-G2i8ttx — Pe Dro, Nov 27 '19 at 02:23
I don’t know; I just pointed out the discrepancy between your formula and your plot in case it indicated a relevant misunderstanding. — Davis Herring, Nov 27 '19 at 03:58

a_guest · Answer 1 · 2019-11-27T11:39:57.850

There's nothing wrong with using that normalization, unless you use a fancy weight initialization for the neural network. It rather seems that something goes wrong during training but without further details on that side, it's hard to pinpoint the problem.

I ran a quick crosscheck using tensorflow (MSE loss; Adam optimizer) and it does converge in that case:

Here's the code for reference:

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf


x = np.linspace(0, 2*np.pi, 500)
y = 0.2 + 0.4*x**2 + 0.3*x*np.sin(15*x) + 0.05*np.cos(50*x)


class Model(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.h1 = tf.keras.layers.Dense(5, activation='tanh')
        self.out = tf.keras.layers.Dense(1, activation=None)

    def call(self, x):
        return self.out(self.h1(x))


model = Model()
loss_object = tf.keras.losses.MeanSquaredError()
train_loss = tf.keras.metrics.Mean(name='train_loss')
optimizer = tf.keras.optimizers.Adam()


@tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        loss = loss_object(y, model(x))
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    train_loss(loss)


# Normalize data.
x /= y.max()
y /= y.max()
data_set = tf.data.Dataset.from_tensor_slices((x[:, None], y[:, None]))
train_ds = data_set.shuffle(len(x)).batch(64)

loss_history = []
for epoch in range(5000):
    for train_x, train_y in train_ds:
        train_step(train_x, train_y)

    loss_history.append(train_loss.result())
    print(f'Epoch {epoch}, loss: {loss_history[-1]}')
    train_loss.reset_states()

plt.figure()
plt.xlabel('Epoch')
plt.ylabel('MSE loss')
plt.plot(loss_history)

plt.figure()
plt.plot(x, y, label='original')
plt.plot(x, model(list(data_set.batch(len(x)))[0][0]), label='predicted')
plt.legend()
plt.show()

Thanks for the efforts. I found the problem. It was correct normalization implemented with errors. I shall be posting the answer soon. — Pe Dro, Nov 27 '19 at 12:21

score 0 · Accepted Answer · answered Nov 28 '19 at 06:02

I found the case to be not so regular and this was the mistake: 1) I was finding the normalizing factor correctly, but had to change this:

 for (int i = 0; i < numTrainingSets; i++)
 {
    //Find and update Normalization factor(as shown in the question)

    //Normalize the training example
 }

to

 for (int i = 0; i < numTrainingSets; i++)
 {
    //Find Normalization factor (as shown in the question)
 }

  for (int i = 0; i < numTrainingSets; i++)
 {    
    //Normalize the training example
 }

Also, the validation set was earlier generated as :

int numTestSets = 500;
for (int i = 0; i < numTestSets; i++)
{
    //Generate data
    double p = (2*PI*i/numTestSets)/MAXX;
    //And other steps...
}

whereas the Training data was generated on numTrainingSets = 100. Hence, p generated for training set and the one generated for validation set lies in different range. So, I had to make ** numTestSets = numTrainSets**.

Lastly,

Is this normalizing right?

I had been wrongly normalizing the actual result too! As shown in the question:

double p = (2*PI*i/numTestSets)/MAXX;
x.push_back(p);     //FORMS THE X-AXIS FOR PLOTTING

///Actual Result
double res = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p);

Notice: the p used to generate this actual result has been normalized (unnecessarily).

This is the final result after resolving these issues...

Is such a normalization right for wiggly curves?

2 Answers2