0

So I successfully split my dataset into Train & Test in a ratio of 70:30 I used this:

df_glass['split'] = np.random.randn(df_glass.shape[0], 1)
msk = np.random.rand(len(df_glass)) <= 0.7
train = df_glass[msk]
test = df_glass[~msk]
print(train)
print(test)

Now how do I split train and test to X_train and y_train and X_test and y_test Such that, X denotes the features of the database and y denotes the response?

I need to do supervised learning and apply ML modules on X_Train and y_Train.

My database looks like this: Database_snippet

Vivek Kalyanarangan
  • 8,951
  • 1
  • 23
  • 42
Gaurav Singh
  • 61
  • 1
  • 1
  • 5

2 Answers2

3

Scikit-Learn has a convenience method for splitting pandas dataframes -

This will do the split -

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df[list_of_X_cols], df['y'], test_size=0.33, random_state=42)
Vivek Kalyanarangan
  • 8,951
  • 1
  • 23
  • 42
2

i guess you may found this useful to understand..

import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.linear_model import LinearRegression

#importing dataset
dataset = pd.read_csv('Salary_Data.csv')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values

#spliting the dataset into training and test set
x_train, x_test, y_train, y_test = train_test_split(x, y, 
test_size=1/3, random_state=0)
  • Hi can you help me understand the meaning of : x = dataset.iloc[:, :-1].values y = dataset.iloc[:, 1].values Acc to the database my features are in first 5 columns and the last column is the response. – Gaurav Singh Nov 16 '17 at 05:37
  • a Few tweaks here and there and It worked! Thanks – Gaurav Singh Nov 16 '17 at 06:05
  • iloc is just basically integer-location based indexing for selection by position. my model was a simple linear regression with one independent variable and i was splitting the data into x = "independent variable" and y = "dependent variable" following the linear equation y = mx + b. – Ariful Shuvo Nov 17 '17 at 04:03