Split dataframe into two on the basis of date

Question

I have dataset with 1000 rows like this

 Date,      Cost,         Quantity(in ton),    Source,          Unloading Station
    01/10/2015, 7,            5.416,               XYZ,           ABC

i want to split the data on the base of date. For e.g. till date 20.12.2016 is a training data and after that it is test data.

How should i split? Is it possible?

Yes, it is possible to split data this way. Whether it's the right thing to do is contextual; your intention already seems to be to split it this way. I'm unclear on the type of answer you are expecting. Can you clarify the question? — roganjosh, May 30 '16 at 18:57
@roganjosh there is a dataset with above labels( date,cost, quantity,source,destination). Now there is a specific date given (e.g. 1/10/2016), till this date i want my dataset as a training dataset and rest will be training. And on a particular date there are lots of quantity has been sent from source to destination. Just like this, dates are serialized, e.g. from 1/1/2015 to 1/1/2016 — kush, May 30 '16 at 19:12
what is the type of your dataset? is it a pandas data frame? — MaxU - stand with Ukraine, May 30 '16 at 19:57
@kush it's still not a question though, it's just a statement. How is your data read into Python? "How should I split?" is impossible to answer. "is it possible?" - almost certainly "yes". You need to clarify in the question what you are looking to do and, preferably, post what you have tried that doesn't work. — roganjosh, May 30 '16 at 20:56
i was using pandas data frame and it was easy to split it into different sets on the basis of date — kush, Dec 17 '16 at 10:40

score 18 · Answer 1 · answered Sep 14 '18 at 12:51

18

You can easily do that by converting your column to pandas to_datetime type and set it as index.

import pandas as pd
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index(df['Date'])
df = df.sort_index()

Once you have your data in this format, you can simply use date as index for creating partition as follows:

# create train test partition
train = df['2015-01-10':'2016-12-20']
test  = df['2016-12-21':]
print('Train Dataset:',train.shape)
print('Test Dataset:',test.shape)

answered Sep 14 '18 at 12:51

Sayali Sonawane

12,289
5
46
47

I met similar problems but my Date is already an index just I have different ids with the same interval. How to split this? – Ben10 Nov 25 '20 at 18:34
Can you please post a sample dataset? – Sayali Sonawane Nov 25 '20 at 22:15
Is `df.sort_index()` necessary? – Nermin Mar 15 '23 at 16:04

score 12 · Answer 2 · answered May 30 '16 at 20:01

12

assuming that your data set is pandas data frame and that Date column is of datetime dtype:

split_date = pd.datetime(2016,12,20)

df_training = df.loc[df['Date'] <= split_date]
df_test = df.loc[df['Date'] > split_date]

answered May 30 '16 at 20:01

MaxU - stand with Ukraine

205,989
36
386
419

score 1 · Answer 3 · answered May 03 '20 at 03:43

If your date is in standard python datetime format ie. '2016-06-23 23:00:00', you can use the code below

split_date ='2016-06-23 23:00:00' train_data = train_data.loc[train_data['Date'] <= split_date] validation_data = train_data.loc[train_data['Date'] > split_date]

Split dataframe into two on the basis of date

3 Answers3

Linked