1

I am trying to scale or normalize the inputs of my training dataset using the sklearn.preprocessing.MinMaxScaler(). After normalizing the input, I came to know very weird output graph. It was not like the input before normalization. Please have a look at the issue: reference for normalization of pandas columns: https://stackoverflow.com/a/26415620/4948889

def normalize_data(df):
    cols = list(df_stock.columns.values)
    x = df.values #returns a numpy array
    min_max_scaler = sklearn.preprocessing.MinMaxScaler()
    x_scaled = min_max_scaler.fit_transform(x)
    df = pd.DataFrame(x_scaled)
    df.columns = cols
    return df
df_stock = df.copy()
cols = list(df_stock.columns.values)
print('df_stock.columns.values = ', cols)
df_stock_norm = df_stock.copy()
df_stock_norm = normalize_data(df_stock_norm)

See the before and after graphs of the input dataframes. Before:

plt.figure(figsize=(15, 5));
plt.subplot(1,2,1);
plt.plot(df.open.values[:20], color='red', label='open')
plt.plot(df.close.values[:20], color='green', label='close')
plt.plot(df.low.values[:20], color='blue', label='low')
plt.plot(df.high.values[:20], color='black', label='high')
plt.title('stock price')
plt.xlabel('time [days]')
plt.ylabel('price')
plt.legend(loc='best')
plt.show()

output:
before input

After:

  plt.figure(figsize=(15, 5));
    plt.plot(df_stock_norm.open.values[:20], color='red', label='open')
    plt.plot(df_stock_norm.close.values[:20], color='green', label='low')
    plt.plot(df_stock_norm.low.values[:20], color='blue', label='low')
    plt.plot(df_stock_norm.high.values[:20], color='black', label='high')
    plt.title('stock')
    plt.xlabel('time [days]')
    plt.ylabel('normalized price')
    plt.legend(loc='best')
    plt.show()

output:
enter image description here

Explanation:
Please see the high and low values on both the graphs. They are weird and different, which they should be, even after normalization.

Please let me know what I had mistaken.
EDITED:

Have this as sample dataset for testing the above

Jaffer Wilson
  • 7,029
  • 10
  • 62
  • 139
  • Why a downvote? Is my question not MCVE or is there anything else? I found that when people don't want to answer the question they downvote and demotivate the questioner. Atleast leave a comment for downvote so I improve. – Jaffer Wilson Aug 03 '18 at 09:12
  • 2
    If each series is normalised in isolation then it wouldn't surprise me if their relative weights no longer reflect the original data. I have not used that method before, but I'm not sure I'm surprised by the output – roganjosh Aug 03 '18 at 09:24
  • @VivekKumar Why confused? See the lines of the above and below graphs. You will understand the problem. – Jaffer Wilson Aug 03 '18 at 10:37
  • @VivekKumar Let me widen the before graph for more clear. ok? – Jaffer Wilson Aug 03 '18 at 10:38
  • @VivekKumar Now the graphs won't sound similar. Pease have a look at them. – Jaffer Wilson Aug 03 '18 at 10:39
  • Oh ok. thats expected. Thats because for different columns, MinMaxScaler will do individual scaling (as per our previous discussion). So it may happen by chance that after scaling (due to min and max values of that column in original values) scaled values change a bit. – Vivek Kumar Aug 03 '18 at 10:45
  • @VivekKumar but a bit is spoiling the data. see the graphs. If I give this value for training then my training will be wasted. I guess you have experience Machine learning. What we give as input , the output is the expected one. If I give wrong data then wrong output is what I will get. Don't you think the same? – Jaffer Wilson Aug 03 '18 at 10:48
  • 1
    @JafferWilson test it? You're treating related variables as independent variables. "high" and "low" are directly related to the sampling method and you're normalising them in isolation. You're not capturing the relationship that they share. One value _must_ be the maximum of each period and one value _must_ be the minimum of that period. – roganjosh Aug 03 '18 at 10:51
  • The interest is probably in the spread and some momentum of which way the market is moving, not absolute values of maximum and minimum considered in isolation. – roganjosh Aug 03 '18 at 10:54
  • @roganjosh So what I suppose to do? Please can you help me with that. – Jaffer Wilson Aug 03 '18 at 10:54
  • @JafferWilson my advice is limited to the comment I wrote while you were writing yours. I bombed on the FOREX but you're definitely not using the inference available to you. – roganjosh Aug 03 '18 at 11:03

0 Answers0