predicting price when each season has different model

Question

I have a dataset with many columns:

There are 4 variables used for prediction: -season (sum, aut,win,spr) -express_shipment (true, False) -shipping_distance ( in KM) -first_time_customer ( true, false)

These 4 variables are used to calculate the shipping_price, with the following rule, for each season, there is a separate model that uses the above mentioned variables.

I have used an approach where, I converted True to 1 and False to 0 for the 2 Boolean columns I also converted the season in to an integer representation (1,2,3,4)

The problem is my predictions are wildly inaccurate, here is the code i am using

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split 
modeling = data.loc[:,["shipping_distance","season_int","new_cust_int","express_shipment","shipping_charge"]]
x =modeling.iloc[:,:-1]
y =modeling.iloc[:,-1:]
X_train, X_test, y_train, y_test = train_test_split(x,y, random_state = 1)
model = LinearRegression()
model.fit(X_train, y_train)
model.predict(X_test)

Is anyone able to explain what the correct approach to this problem is, and or how to solve it?

How bad are the results in terms of r2_score? – felice Oct 15 '20 at 07:46 — felice, Oct 15 '20 at 07:46

score 0 · Answer 1 · answered Oct 15 '20 at 03:41

Here you use label encoder for "season_int" (1,2,3,4) and the linear regression. That means you assign the "season_int" some intrinsic order for this model. You could try one hot encoding for "season_int":

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

score 0 · Answer 2 · answered Oct 15 '20 at 07:49

Possible answers:

You are using categorical variables for linear regression, which might be an issue. Here are possible solutions.
LinearRegression might not be the best model for your problem, since your problem might not be linear. Try non-linear models such as sklearn.ensemble.RandomForestRegressor for example.
Your dataset might not be valuable enough for the problem you are trying to solve. The variables might not be the best ones to determine the price etc.
You don't have enough data to train your model.

score -1 · Answer 3 · answered Oct 15 '20 at 03:48

-1

It seems like you want a time series model [do you ?] https://www.statsmodels.org/stable/examples/index.html#time-series-analysis

answered Oct 15 '20 at 03:48

辜乘风

152
5

predicting price when each season has different model

3 Answers3