2

I am using standardized predictors in training set to train an LSTM model. After I predict the outcome in test set, I need to reverse the predicted score back to the original scale. Normally I could just use the predicted score * SD of the trainning outcome + MEAN of the training outcome to reverse the scale. However, when doing LSTM, each feature in the training set was measured multiple times, the standardization process returns multiple MEANs and SDs. My questions are:

  1. How do I reverse the scale of the outcome to the original scale with multiple MEANs and SDs using python code?

  2. Alternatively, should I maybe choose another way of normalization on the predictors and outcome so that the reverse of the scale on the outcome can be done? What normalization approach would you recommand?

  3. In a 3-D array like what I have, how to get a single Mean and a SD for each feature?

Thank you very much.

Please see the reproducible python code and output below:


>>> import pandas as pd
>>> import numpy as np
>>> from keras.models import Sequential
>>> from keras.layers import LSTM
>>> from keras.layers import Embedding,LSTM,Dense
>>> from matplotlib import pyplot

>>> dat=pd.DataFrame(np.random.rand(2880*4).reshape(2880,4), columns = ['y','x1','x2','x3'])
>>> x=dat.iloc[:,[1,2,3]]
>>> y=dat.iloc[:,0]
>>> dat.head()
          y        x1        x2        x3
0  0.045795  0.974471  0.916503  0.208624
1  0.398229  0.628749  0.630672  0.672327
2  0.015625  0.164637  0.041553  0.057597
3  0.516001  0.377016  0.752409  0.040648
4  0.451607  0.074149  0.413406  0.245180
>>> dat.shape
(2880, 4)

>>> x=x.values 
>>> y=y.values
>>> y=np.reshape(y,[y.shape[0],1])
>>> train_x=x[0:1440]
>>> train_y=y[0:1440]
>>> test_x=x[1440:2880]
>>> test_y=y[1440:2880]
>>> 
>>> train_x=np.reshape(train_x,[-1,144,train_x.shape[1]])
>>> train_y=np.reshape(train_y,[-1,144,train_y.shape[1]])
>>> test_x=np.reshape(test_x,[-1, 144, test_x.shape[1]])
>>> test_y=np.reshape(test_y,[-1, 144, test_y.shape[1]])
>>> print(train_x.shape,train_y.shape,test_x.shape,test_y.shape)
(10, 144, 3) (10, 144, 1) (10, 144, 3) (10, 144, 1)
>>> 
>>> 
>>> means_x=np.mean(train_x,axis=0) # standardization
>>> means_y=np.mean(train_y,axis=0)
>>> stds_x=np.std(train_x,axis=0)
>>> stds_y=np.std(train_y,axis=0)
>>> s_train_x=(train_x-means_x)/stds_x
>>> s_test_x=(test_x-means_x)/stds_x
>>> s_train_y=(train_y-means_y)/stds_y
>>> s_test_y=(test_y-means_y)/stds_y
>>> 
>>> model=Sequential()
>>> model.add(LSTM(10,input_shape=(s_train_x.shape[1],s_train_x.shape[2])))
>>> model.add(Dense(1))
>>> model.compile(loss='mae',optimizer='adam')
>>> history=model.fit(s_train_x,s_train_y,epochs=50,batch_size=100,validation_data=(s_test_x, s_test_y), shuffle=False)

>>> 
>>> y_pred=model.predict(s_test_x)
>>> 
>>> y_pred
array([[-0.23145649],
       [-0.11043324],
       [-0.10545453],
       [ 0.13147753],
       [-0.2414865 ],
       [ 0.04826045],
       [-0.35677138],
       [-0.11905774],
       [-0.01755336],
       [ 0.16642463]], dtype=float32)
>>> 
user11806155
  • 121
  • 5

0 Answers0