I am trying to scale or normalize the inputs of my training dataset using the sklearn.preprocessing.MinMaxScaler()
. After normalizing the input, I came to know very weird output graph. It was not like the input before normalization. Please have a look at the issue: reference for normalization of pandas columns: https://stackoverflow.com/a/26415620/4948889
def normalize_data(df):
cols = list(df_stock.columns.values)
x = df.values #returns a numpy array
min_max_scaler = sklearn.preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df = pd.DataFrame(x_scaled)
df.columns = cols
return df
df_stock = df.copy()
cols = list(df_stock.columns.values)
print('df_stock.columns.values = ', cols)
df_stock_norm = df_stock.copy()
df_stock_norm = normalize_data(df_stock_norm)
See the before and after graphs of the input dataframes. Before:
plt.figure(figsize=(15, 5));
plt.subplot(1,2,1);
plt.plot(df.open.values[:20], color='red', label='open')
plt.plot(df.close.values[:20], color='green', label='close')
plt.plot(df.low.values[:20], color='blue', label='low')
plt.plot(df.high.values[:20], color='black', label='high')
plt.title('stock price')
plt.xlabel('time [days]')
plt.ylabel('price')
plt.legend(loc='best')
plt.show()
After:
plt.figure(figsize=(15, 5));
plt.plot(df_stock_norm.open.values[:20], color='red', label='open')
plt.plot(df_stock_norm.close.values[:20], color='green', label='low')
plt.plot(df_stock_norm.low.values[:20], color='blue', label='low')
plt.plot(df_stock_norm.high.values[:20], color='black', label='high')
plt.title('stock')
plt.xlabel('time [days]')
plt.ylabel('normalized price')
plt.legend(loc='best')
plt.show()
Explanation:
Please see the high and low values on both the graphs. They are weird and different, which they should be, even after normalization.
Please let me know what I had mistaken.
EDITED:
Have this as sample dataset for testing the above