0

My issue is similar to this question. But I didn't get the answer. I need further clarification.

I am using sklearn linear regression prediction -for the first time- to add more data points to my dataset. Adding more data points will help me identify outliers more accurately. I have built my model and got the predictions but I want the model to return predicted points with a certain range. Is it possible to achieve this?

I would like to predict values in a column called 'delivery_fee'. The values in the column starts from 3 and increases steadily until it reaches 27. The last value in the column and it comes right after 27 is 47.

I would like the model to predict values between 27 and 47.

my code:

import sklearn
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing

#create a copy of the dataframe
delivery_linreg = outlierFileNew.copy()

le = preprocessing.LabelEncoder()
delivery_linreg['branch_code'] = le.fit_transform(delivery_linreg['branch_code'])

#select all columns in the datframe except for delivery_fee
x = delivery_linreg[[x for x in delivery_linreg.columns if x != 'delivery_fee']]
#selecting delivery_fee as the column to be predicted
y = delivery_linreg.delivery_fee
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)

#fitting simple linear regression to training set
linreg = LinearRegression()
linreg.fit(x_train,y_train)
delivery_predict = linreg.predict(x_test)

My model returns values that range from 4 to 17. Which is not the range I want. Any suggestions on how to change the predicted range?

Thank you,

leena
  • 563
  • 1
  • 8
  • 25
  • Possible duplicate of [How to add a range to sklearn's linear regression predictions](https://stackoverflow.com/questions/51317311/how-to-add-a-range-to-sklearns-linear-regression-predictions) – Shishdem Oct 04 '19 at 05:09
  • I have referenced this question in my question. I need further clarification. Thanks. – leena Oct 04 '19 at 05:43
  • if I understand correctly, your training set doesn't have any target value in (27,47) range, right? then it seems to me that your model is working fine. It can only learn from the data you provide. Can you please tell why you need prediction range to be (27,47)? – Shihab Shahriar Khan Oct 04 '19 at 09:08
  • Because I need to detect outliers and I can achieve better results if I have this data range added to my dataset. My model is working but it is not returning results in the required range. I want to learn how to specify that range. Thanks – leena Oct 04 '19 at 10:02
  • 1
    for me you are faking same data, this is the wrong approach to detect outlier... – PV8 Oct 04 '19 at 11:54
  • It an assignment I have to solve it this way. – leena Oct 04 '19 at 20:48

0 Answers0