There are multiple ways to do this, I will run by a few of them.
Slicing is a powerful method in python and accepts the arguments as data[start:stop:step]
in your case if you just want the first 800 copies and your data frame is named as train
for input features and Y
for output features you can use
X_train = train[0:800]
X_test = train[800:]
y_train = Y[0:800]
y_test = Y[800:]
Iloc function is associated with a dataFrame and is associated with an Index, if your Index is numeric then you can use
X_train = train.iloc[0:800]
X_test = train.iloc[800:]
y_train = Y.iloc[0:800]
y_test = Y.iloc[800:]
If you just have to split the data into two parts, you can even use the df.head()
and df.tail()
to do it,
X_train = train.head(800)
X_test = train.tail(200)
y_train = Y.head(800)
y_test = Y.tail(200)
There are other ways to do it too, I would recommend using the first method as it is common across multiple datatypes and will also work if you were working with a numpy array. To learn more about slicing I would suggest that you checkout. Understanding slice notation here it is explained for a list, but it works with almost all forms.