I have been trying to do a little research about different function approximation methods, and the first one I tried is using ANN (Artificial Neural Net). The code is following -
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
from tensorflow.keras.layers import Input, Dense, Flatten, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from sklearn.preprocessing import MinMaxScaler
X = np.linspace(0.0 , 2.0 * np.pi, 20000).reshape(-1, 1)
Y = np.sin(X)
x_scaler = MinMaxScaler()
y_scaler = MinMaxScaler()
X = x_scaler.fit_transform(X)
Y = y_scaler.fit_transform(Y)
plt.plot(X, Y)
plt.show()
inp = Input(shape=(20000, 1))
x = Dense(32, activation='relu')(inp)
x = Dense(64, activation='relu')(x)
x = Dense(128, activation='relu')(x)
x = Dense(256, activation='relu')(x)
predictions = Dense(1, activation='linear')(x)
model = Model(inp, predictions)
model.compile(loss='mse', optimizer='adam')
model.summary()
X = X.reshape((-1, 20000, 1))
Y = Y.reshape((-1, 20000, 1))
history = model.fit(X, Y, epochs=500, batch_size=32, verbose=2)
X_test = np.linspace(0.0 , 2.0 * np.pi, 20000).reshape(-1, 1)
X_test.shape
X_test = x_scaler.transform(X_test)
X_test = X_test.reshape((-1, 20000, 1))
res = model.predict(X_test, batch_size=32)
res = res.reshape((20000, 1))
res_rscl = y_scaler.inverse_transform(res)
Y_rscl = y_scaler.inverse_transform(Y.reshape(20000, 1))
plt.subplot(211)
plt.plot(res_rscl, label='ann')
plt.plot(Y_rscl, label='train')
plt.xlabel('#')
plt.ylabel('value [arb.]')
plt.legend()
plt.subplot(212)
plt.plot(Y_rscl - res_rscl, label='diff')
plt.legend()
plt.show()
The plots are like following -
As we can see that it indeed approximated Sine curve very well with this architecture. However, I am not really sure I am doing the right thing. It looks strange to me that I need 43,777
parameters to fit the sine curve. Maybe I am wrong. However, looking at this R code (I do not know R at all, but I am guessing that the ANN is much smaller than what I have) makes me wonder more.
My question - Is my approach right? Should I change something so that the number of parameters becomes less? Or is it normal that sine is a difficult function and for ANN it takes a good number of parameters to approximate it?
It may be somewhat an open-ended question, but I would really appreciate any direction that you can point me to and any mistake that I am making that you can show me.
Note - This question suggests that the cyclic nature of the data is the hard thing for ANN. I would also like to know if this is really the case and if that is the reason the ANN takes so many parameters.