How to use train_test_split in this situation?

Question

 from sklearn.model_selection import train_test_split
 Data1 = pd.read_csv(r"C:\Users\Zihao\Desktop\New\OBSTET.csv", index_col = 0)
 Data1.fillna(0, inplace = True) 
 Dependent = Data1.ix[:,0]
 X_train, y_train, x_test, y_test = train_test_split()

This is my data. I know that the first column is the dependent variable, and the rest of the columns are independent variables.

How do I split this? I am not sure which argument I should pass.

score 2 · Accepted Answer · edited Apr 24 '18 at 04:34

2

If you are trying to predict your Dependent variable, that would be your "y". While the Independent variables are your "X".

If that is the case:

Dependent = Data1.ix[:, 0]    # your "y"
Independent = Data1.ix[:, 1:] # the rest of the columns (commonly refered to as "X"
X_train, x_test, y_train, y_test = train_test_split(Independent, Dependent)

That will put 75% of your data int X_train, y_train. And the other 25% into x_test, y_test.

edited Apr 24 '18 at 04:34

Vivek Kumar

35,217
8
109
132

answered Apr 24 '18 at 02:20

Cole Howard

343
1
8

Thanks. I see. X, and Y both needed to be arrays, Right? – Tom Apr 24 '18 at 05:22
Hello, another follow up question I want to ask. How would I slice this, if the third column is dependent variable, and rest of columns are independent variables. @Cole Howard – Tom Apr 24 '18 at 20:15
What dimension are looking to slice across? The train_test_split method is already slicing the rows for you. – Cole Howard Apr 24 '18 at 20:18
I am speaking hypothetically for another situation. If I have 5 columns, the third column is dependent variables. first, second, fourth, fifth columns are independent, how do I code that? @Cole Howard – Tom Apr 24 '18 at 20:22
The two params of the ix are lists or sequences. So: `Independent = Data1.ix[:, [0, 1, 3, 4]]` See the answer on this question for more like this: https://stackoverflow.com/questions/10665889/how-to-take-column-slices-of-dataframe-in-pandas – Cole Howard Apr 25 '18 at 00:44

How to use train_test_split in this situation?

1 Answers1