2

I have being experimenting with different kinds of ANNs to do regression on basic and increasingly more complex functions. It seems, to me though that I cannot get my network to learn cyclic functions like a sine wave. I read on the web and on this forums that generally ANN are not good at this job but I cant seem to fathom why. Isn't learning any function within its domain the same?

For clarification I am trying to fit a sin wave from x=0 to x=100 using the following setup

def create_model():
  model = tf.keras.models.Sequential([
      keras.layers.Dense(units=1, activation=None,input_dim=1,kernel_initializer='random_normal'),
      keras.layers.Dense(units=64,activation='linear',use_bias=True),   
      keras.layers.Dense(units=32,activation="relu",use_bias=True),
      keras.layers.Dense(units=64,activation="relu"),
      keras.layers.Dense(units=64,activation='linear',use_bias=True),   
      keras.layers.Dense(units=32,activation='relu'),   
      keras.layers.Dense(units=1, activation='sigmoid'),
  ])

  model.compile(optimizer='adam',
            loss='mean_squared_logarithmic_error',
            metrics=['mean_squared_error'])



 return model

# Create a basic model instance
model = create_model()
# Display the model's architecture
model.summary()`

I have regularized my data to fit into the [x,y]=[0,1]^2 space and fed it into the network. I have given the network 1000 points and left it to train for many epochs (~100,000) and these are the results I got:

Overfitting Predictions

I can understand that this is standard over-fitting behavior but I can't understand why it behaves as such. In Goodfellow's Deep Learing (which I am in the process of reading) he explains that optimal behavior of a machine Learning Algorithm is between the overfitting and underfitting region. It seems then that the model I have created is not converging to the solution in the future and is expected to perform worse!

Does this mean it can't interpolate to the sine function? Also why is this function so much more demanding computationally (most simple functions i tried converged in <1000 epochs) compared to other? Does it mean it requires more layer or maybe more units per layer? I understand the problem to be a classic regression problem for which I though sequential models where good.

Last but not least, I know that ANNs are not the way to go for periodic functions, but I am trying to understand why they struggle in this as a regression method.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Tryfonas
  • 51
  • 3

0 Answers0