0

i am working on a simple linear regression model for practicing in order to learn machine learning . my model runs correctly however it get a bad score which means it is a bad model so any advice for better model will be appreciated . and here is my model

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

########## reading training set ##########

data = pd.read_csv("train.csv", delimiter=",", header=0)
x = data[['Col1', 'Col2']]
y = data['Expected']

########## building model ##########

reg = LinearRegression()
reg.fit(x, y)

########## reading test making predictions ##########

data_test = pd.read_csv("test.csv",delimiter=",", header=0)
x_test = data_test[['Col1', 'Col2']]
prediction = reg.predict(x_test)
np.savetxt("prediction.txt",prediction,delimiter=',')
Halawa
  • 1,171
  • 2
  • 9
  • 19
  • Two things: 1) just printing the code you use to process de Linear Regression isn't useful. A sample of the dataset you're analysing would help more than a code snippet similar to every snippet in `scikit-learn` docs. And 2) you should also ask this on [Cross Validated](http://stats.stackexchange.com/), a StackOverflow spin-off for Machine Learning and Data Analysis. – lsdr Apr 24 '15 at 17:01
  • i use train.csv file contains 3 columns the input "Col1" , "Col2" and the output "Expected" , i think they are random values or something like that and they do not represent anything however i can upload them if this will make you give me a good advice :D – Halawa Apr 24 '15 at 17:26
  • 2
    Since you have just 2 features, why not visualize your data (in 3D, apparently) to see if it has a linear shape? – Artem Sobolev Apr 25 '15 at 11:13
  • how to do that how to visualize my data excuse me i am new to python ? – Halawa Apr 25 '15 at 20:40
  • You could simple use Excel or any spreadsheet to plot a 2D graph and visualize it to see if it has any correlation. – lsdr Apr 27 '15 at 15:15
  • But, since you're playing with Python, check this answer here in StackOverflow: http://stackoverflow.com/questions/6323737/make-a-2d-pixel-plot-with-matplotlib or the plotting example of Pandas: http://pandas.pydata.org/pandas-docs/stable/visualization.html – lsdr Apr 27 '15 at 15:15

1 Answers1

1

It may be not that linear regression is a bad model but that your variables are not properly transformed to avoid regression issues. In many circumstances also non linearity is due to artifacts within the data and not to the wrong use of a linear regression model for the variables used.

Are you pre-procesing the variables (all) so they are all weak sense stationary (WSS) stationary, Are the variables all expresed in the same terms (for example percentage change). Have you check for homocedasticity and serial correlation in the results of the regression. Is your data balanced or unbalanced (positive to negative elements). Have you check your data for normality and if not applied a proper transformation (box cox or other). If the data you are using in regression has any or a combination of this issues your results may not be valid. Please run tests for all the mentioned issues, so you are sure you provide to the regression variables in the adequate form so results are interpretable and valid.

Also what measures of error you are using RMSE or R2 other, each measure has its own issues. Is the training sample statistically significant to provide statistical validity.

I would look at this first as is usually the root of may problems when using regression prior to being sure that linear regression is not the adequate implementation model.

Barnaby
  • 1,472
  • 4
  • 20
  • 34