1
 from sklearn.model_selection import train_test_split
 Data1 = pd.read_csv(r"C:\Users\Zihao\Desktop\New\OBSTET.csv", index_col = 0)
 Data1.fillna(0, inplace = True) 
 Dependent = Data1.ix[:,0]
 X_train, y_train, x_test, y_test = train_test_split()

This is my data. I know that the first column is the dependent variable, and the rest of the columns are independent variables.

How do I split this? I am not sure which argument I should pass.

petezurich
  • 9,280
  • 9
  • 43
  • 57
Tom
  • 23
  • 1
  • 11

1 Answers1

2

If you are trying to predict your Dependent variable, that would be your "y". While the Independent variables are your "X".

If that is the case:

Dependent = Data1.ix[:, 0]    # your "y"
Independent = Data1.ix[:, 1:] # the rest of the columns (commonly refered to as "X"
X_train, x_test, y_train, y_test = train_test_split(Independent, Dependent)

That will put 75% of your data int X_train, y_train. And the other 25% into x_test, y_test.

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132
Cole Howard
  • 343
  • 1
  • 8
  • Thanks. I see. X, and Y both needed to be arrays, Right? – Tom Apr 24 '18 at 05:22
  • Hello, another follow up question I want to ask. How would I slice this, if the third column is dependent variable, and rest of columns are independent variables. @Cole Howard – Tom Apr 24 '18 at 20:15
  • What dimension are looking to slice across? The train_test_split method is already slicing the rows for you. – Cole Howard Apr 24 '18 at 20:18
  • I am speaking hypothetically for another situation. If I have 5 columns, the third column is dependent variables. first, second, fourth, fifth columns are independent, how do I code that? @Cole Howard – Tom Apr 24 '18 at 20:22
  • The two params of the ix are lists or sequences. So: `Independent = Data1.ix[:, [0, 1, 3, 4]]` See the answer on this question for more like this: https://stackoverflow.com/questions/10665889/how-to-take-column-slices-of-dataframe-in-pandas – Cole Howard Apr 25 '18 at 00:44