0

I have a Excel sheet with 2 colums and 1000 rows. I want to give this as inputs to my Linear Regression Fit command using the sklearn. / when I want to create a dataframe using panda how can I give the inputs? like df_x=pd.dataFrame(...)

I used without dataframe sucessfully as:

npMatrix=np.matrix(raw_data)
X,Y=npMatrix[:,1],npMatrix[:,2]

md1=LinearRegression().fit(X,Y)

Can you help with me Pandas how to access the rows?

user826407
  • 159
  • 1
  • 8
  • `df = pd.read_excel(...)`, `arr = df.as_matrix()` – furas Nov 26 '17 at 02:30
  • how can I get x and y from the pd...did not get can u please give more info – user826407 Nov 26 '17 at 02:34
  • if you need it as numpy array then `arr = df.as_matrix()` and `X = arr[:,0]`, `Y = arra[:,1]` or `X = df[0].as_matrix()` , `Y = df[1].as_matrix()` – furas Nov 26 '17 at 02:38
  • Thanks Numpy i am able to do as mentioned in the question. I want to get it as a dataframe – user826407 Nov 26 '17 at 02:40
  • If you need as dataframe then `X = df[0]` and `Y = df[1]`. If your columns have names (ie. `"column1"`, `"column2"`) then `X = df["column1"]` and `Y = df["column2"]` – furas Nov 26 '17 at 02:43

2 Answers2

0

I think you can convert a pandas dataframe to a numpy array by np.array()

This is discussed here: Quora: How does python-pandas go along with scikit-learn library?

The example, by Muktabh Mayank, is copied below:

>>> from pandas import *
>>> from numpy import *
>>> new_df = DataFrame(array([[1,2,3,4],[5,6,7,8],[9,8,10,11],[16,45,67,88]]))
>>> new_df.index= ["A1","A2","A3","A4"]
>>> new_df.columns= ["X1","X2","X3","X4"]
>>> new_df
X1  X2  X3  X4
A1   1   2   3   4
A2   5   6   7   8
A3   9   8  10  11
A4  16  45  67  88
>>> array(new_df)
array([[ 1,  2,  3,  4],
   [ 5,  6,  7,  8],
   [ 9,  8, 10, 11],
   [16, 45, 67, 88]], dtype=int64)
>>>

And btw, people are actually working on bridging sklearn and pandas: sklearn-pandas

Qihong
  • 103
  • 2
  • 7
0

You can read excel

df = pd.read_excel(...)

You can single column using column number

X = df[0] 
Y = df[1] 

If columns have names ie. "column1", "column2"

X = df["column1"] 
Y = df["column2"]

But it gives single column as Series.
If you need single column as DataFrame then use list of columns

X = df[ [0] ]
Y = df[ [1] ]

More: How to get column by number in Pandas?

furas
  • 134,197
  • 12
  • 106
  • 148