How to split dataset as train and test data into rows using date, pandas and python?

Question

How to split dataset as train and test data into rows using date like first 90%(from 2018-01-01 until 2019-02-01) would be train & last 10%(from 2019-02-02 ) would be test data in python?Not splitting randomly?

I believe this `df_train = df[df['date'] < '2019-02-01']` and this `df_test = df[df['date'] > '2019-02-02']` should do the trick — Louis, Jun 02 '20 at 09:50
@Louis this code --> **from sklearn.model_selection import train_test_split train_features, test_features, train_labels,test_labels = train_test_split(features, labels,test_size = 0.25, random_state = 42)** splits the data, i want something similar but splitting the data using date. — Themba Mahlasela, Jun 02 '20 at 10:10
If you order your dataframe by the date and then use `sklearn.train_test_split` with the parameter `shuffle` set to `False` it should allow you to get the result you want. — Louis, Jun 02 '20 at 11:48

score 0 · Answer 1 · answered Jun 02 '20 at 17:12

0

As explained in this SO post you can split it with np.split:

import numpy as np
df = df.sort_values('date') 
data = df.values
train_set, test_set= np.split(data, [int(.9 * len(data))])

answered Jun 02 '20 at 17:12

above_c_level

3,579
3
22
37

score 0 · Answer 2 · answered Jan 27 '22 at 10:48

If your data is already sorted based on time/date within pandas dataframe then simply use shuffle=False

from sklearn.model_selection import train_test_split

#target_attribute = df['column_name'] 
#You should drop target column before, you put it into train_test_split. 
#df = df.drop(columns = ['column_name'], axis = 1)

trainingSet, testSet = train_test_split(df,
                                        #target_attribute, 
                                        test_size=0.2,
                                        random_state=42,
                                        #stratify=y,
                                        shuffle=False)

How to split dataset as train and test data into rows using date, pandas and python?

2 Answers2