MLP with Scikitlearn: Artificial Neural Network application for forecast

Question

I have traffic data and I want to predict number of vehicles for the next hour by showing the model these inputs: this hour's number of vehicles and this hour's average speed value. Here is my code:

dataset=pd.read_csv('/content/final - Sayfa5.csv',delimiter=',') 
dataset=dataset[[ 'MINIMUM_SPEED', 'MAXIMUM_SPEED', 'AVERAGE_SPEED','NUMBER_OF_VEHICLES','1_LAG_NO_VEHICLES']]
X = np.array(dataset.iloc[:,1:4])
L = len(dataset)
Y = np.array([dataset.iloc[:,4]])
Y= Y[:,0:L]
Y = np.transpose(Y)

#scaling with MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X)
X = scaler.transform(X)
 
scaler.fit(Y)
Y = scaler.transform(Y)
print(X,Y)

X_train , X_test, Y_train, Y_test = train_test_split(X,Y,test_size=0.3)
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error 
mlp = MLPRegressor(activation='logistic')
mlp.fit(X_train,Y_train)
predictions = mlp.predict(X_test)
predictions1=mlp.predict(X_train)
print("mse_test :" ,mean_squared_error(Y_test,predictions), "mse_train :",mean_squared_error(Y_train,predictions1))

I got good mse values such as mse_test : 0.005467816018933008 mse_train : 0.005072774796622158

But I am confused in two point:

Should I scale y values, I read so many blog written that one should not to scale Ys, only scale the X_train and X_test. But I got so bad mse scores such as 49,50,100 or even more.
How can I get predictions for the future but not scaled values. For example I wrote:

    Xnew=[[ 80 , 40 , 47],
    [ 80 , 30,  81],
    [ 80 , 33, 115]]
    Xnew = scaler.transform(Xnew)
    print("prediction for that input is" , mlp.predict(Xnew))

But I got scaled values such as : prediction for that input is [0.08533431 0.1402755 0.19497315]

It should have been like this [81,115,102].

just as a side note: usually a lagged variable means looking at the PREVIOUS period. When the naming of you variables is correct your y should be `dataset.iloc[3]` — TiTo, Dec 18 '20 at 16:03
You are so right, I miused the concept. Thanks for the correction — B. Selin Zaza, Dec 19 '20 at 20:37

score 2 · Accepted Answer · answered Dec 18 '20 at 18:18

Congrats on using [sklearn's MLPRegressor][1], an introduction to Neural Networks is always a good thing.

Scaling your input data is critical for neural networks. Consider reviewing Chapter 11 of Etham Alpaydin's Introduction to Machine Learning. This is also put into great detail in the Efficient BackProp paper. To put it plainly, it is critical to scale the input data so that your model learns how to target an output.

In english, scaling in this case means converting your data into values between 0 and 1 (inclusive). A good Stats Exchange post on this describes the differences in scaling. For MinMax scaling, you are keeping the same distribution of your data, including being sensitive to outliers. More robust methods (described in that post) do exist in sklearn, such as RobustScaler.

So take for example a very basic dataset like this:

| Feature 1 | Feature 2 | Feature 3 | Feature 4 | Feature 5 | Target |
|:---------:|:---------:|:---------:|:---------:|:---------:|:------:|
|     1     |     17    |     22    |     3     |     3     |   53   |
|     2     |     18    |     24    |     5     |     4     |   54   |
|     1     |     11    |     22    |     2     |     5     |   96   |
|     5     |     20    |     22    |     7     |     5     |   59   |
|     3     |     10    |     26    |     4     |     5     |   66   |
|     5     |     14    |     30    |     1     |     4     |   63   |
|     2     |     17    |     30    |     9     |     5     |   93   |
|     4     |     5     |     27    |     1     |     5     |   91   |
|     3     |     20    |     25    |     7     |     4     |   70   |
|     4     |     19    |     23    |     10    |     4     |   81   |
|     3     |     13    |     8     |     19    |     5     |   14   |
|     9     |     18    |     3     |     67    |     5     |   35   |
|     8     |     12    |     3     |     34    |     7     |   25   |
|     5     |     15    |     6     |     12    |     6     |   33   |
|     2     |     13    |     2     |     4     |     8     |   21   |
|     4     |     13    |     6     |     28    |     5     |   46   |
|     7     |     17    |     7     |     89    |     6     |   21   |
|     4     |     18    |     4     |     11    |     8     |    5   |
|     9     |     19    |     7     |     21    |     5     |   30   |
|     6     |     14    |     6     |     17    |     7     |   73   |

I can slightly modify your code to play with this:

import pandas as pd, numpy as np
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import mean_squared_error 

df = pd.read_clipboard()

# Build data
y = df['Target'].to_numpy()
scaled_y = df['Target'].values.reshape(-1, 1) #returns a numpy array
df.drop('Target', inplace=True, axis=1)
X = df.to_numpy()

#scaling with RobustScaler
scaler = RobustScaler()
X = scaler.fit_transform(X)

# Scaling y just to show you the difference
scaled_y = scaler.fit_transform(scaled_y)

# Set random_state so we can replicate results
X_train , X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=8)
scaled_X_train , scaled_X_test, scaled_y_train, scaled_y_test = train_test_split(X,scaled_y,test_size=0.2, random_state=8)

mlp = MLPRegressor(activation='logistic')
scaled_mlp = MLPRegressor(activation='logistic')

mlp.fit(X_train, y_train)
scaled_mlp.fit(scaled_X_train, scaled_y_train)

preds = mlp.predict(X_test)
scaled_preds = mlp.predict(scaled_X_test)

for pred, scaled_pred, tar, scaled_tar in zip(preds, scaled_preds, y_test, scaled_y_test):
    print("Regular MLP:")
    print("Prediction: {} | Actual: {} | Error: {}".format(pred, tar, tar-pred))
    
    print()
    print("MLP that was shown scaled labels: ")
    print("Prediction: {} | Actual: {} | Error: {}".format(scaled_pred, scaled_tar, scaled_tar-scaled_pred))

In short, shrinking your target will naturally shrink your error, since your model is not learning the actual value, but the value between 0 and 1.

That is why we do not scale our target variable, since the error is naturally smaller since we are forcing the values into a 0...1 space.

It is very exciting to learn a new thing like this! I thank you a million times! :) I am trying right ahead. Actually my dataset not much of a varying one but also similar to your example. So I will try out to see the results! I wish you a very perfect day! — B. Selin Zaza, Dec 19 '20 at 20:43
Sure! If you feel this answered your question, please feel free to accept my answer so others can use it too. — artemis, Dec 20 '20 at 14:07

score 1 · Answer 2 · answered Dec 18 '20 at 15:51

you can't compare the MSE between those two models, as it is scale dependent. It is clear that your MSE is smaller when you scale your y, as y (in your case) becomes smaller by applying the MinMaxScaler. When you "unscale" your predicted and actual y values and than calculate the MSE again it should be the same as for the model with the raw y values (not 100% sure though).

Main takeaway: do not compare MSE between models only within a model. The absolute value of MSE is hard to interpret.

I'm not sure I get your question. I presume you're asking that ones you trained your model with scaled y values how can you make predictions of y in a sense that the scale is No. of vehicles. If that is the question that this is exactly why you should not scale y.

What the MinMaxScaler does is calculating the following:

X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
X_scaled = X_std * (max - min) + min

see also the MinMaxScaler docu. You could try to reverse calculate the predictions afterwards but I don't think that this makes sense

A classic scale-invariant metric is Mean Absolute Scaled Error, which was designed specifically for forecasting methods: https://en.wikipedia.org/wiki/Mean_absolute_scaled_error — blacksite, Dec 18 '20 at 15:54
It was really enlightening, thanks a lot! I was a bit confused but now I see it better. — B. Selin Zaza, Dec 19 '20 at 20:38

Steven Barnard · Answer 3 · 2020-12-18T17:47:49.363

1

Since this is a regression problem you wouldn't likely want to scale the target/response variable in this case. You will want to scale certain features especially considering large magnitude numbers that pair with other features that may be for example binary. But without seeing the full dataset I can't confirm if this would be the way to go here. Also, you should be comparing this to a baseline model. An MSE of 45,000 may seem bad, but if the MSE of the baseline is 10X that then your model just improved on the baseline MSE by 1000%.

TL;DR Only attempt to scale if features differ in magnitude or there are large outliers in a given feature. enter code here

If you don't scale your target variable, you shouldn't need to attempt to 'rescale it'. However, if you need/want to you can look into using TransformedTargetRegressor here

edited Dec 18 '20 at 17:47

answered Dec 18 '20 at 15:56

Steven Barnard

514
7
12

accuracy, F1 score, recall, and precision apply for classification rather than regression models. I think MASE as @blacksite suggested is a better suggestion – TiTo Dec 18 '20 at 16:00
1

thank you! hadn't had morning coffee yet haha confusion matrices only apply to classification. Edit made – Steven Barnard Dec 18 '20 at 17:48
That was a perfect explanation. I really appreciate your time and effort. And you are right then about the second part. Dataset values are not so varying, so I should better not to scale them. I see now. I hope you a perfect day! @StevenBarnard – B. Selin Zaza Dec 19 '20 at 20:40

MLP with Scikitlearn: Artificial Neural Network application for forecast

3 Answers3