My issue is similar to this question. But I didn't get the answer. I need further clarification.
I am using sklearn
linear regression prediction -for the first time- to add more data points to my dataset. Adding more data points will help me identify outliers more accurately. I have built my model and got the predictions but I want the model to return predicted points with a certain range. Is it possible to achieve this?
I would like to predict values in a column called 'delivery_fee'
.
The values in the column starts from 3 and increases steadily until it reaches 27.
The last value in the column and it comes right after 27 is 47.
I would like the model to predict values between 27 and 47.
my code:
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing
#create a copy of the dataframe
delivery_linreg = outlierFileNew.copy()
le = preprocessing.LabelEncoder()
delivery_linreg['branch_code'] = le.fit_transform(delivery_linreg['branch_code'])
#select all columns in the datframe except for delivery_fee
x = delivery_linreg[[x for x in delivery_linreg.columns if x != 'delivery_fee']]
#selecting delivery_fee as the column to be predicted
y = delivery_linreg.delivery_fee
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)
#fitting simple linear regression to training set
linreg = LinearRegression()
linreg.fit(x_train,y_train)
delivery_predict = linreg.predict(x_test)
My model returns values that range from 4 to 17. Which is not the range I want. Any suggestions on how to change the predicted range?
Thank you,