5

I am trying to split a dataframe into two based on date. This has been solved for a related problem here: Split dataframe into two on the basis of date

My dataframe looks like this:

               abcde     col_b
2008-04-10  0.041913  0.227050
2008-04-11  0.041372  0.228116
2008-04-12  0.040835  0.229199
2008-04-13  0.040300  0.230301
2008-04-14  0.039770  0.231421

How do I split it based on date (say before 2008-04-12 and after)? When I try this:

df.loc[pd.to_datetime(df.index) <= split_date]

where split_date is datetime.date(2008-04-12), I get this error:

*** TypeError: <class 'datetime.date'> type object 2008-04-12
user308827
  • 21,227
  • 87
  • 254
  • 417

3 Answers3

5

from your code

where split_date is datetime.date(2008-04-12), I get this error

here datetime.date() takes argument as format 2008,4,12 for more. so you should write

split_date = datetime.date(2008,4,12)

and as you sample input the first column has no name so you can follow to access the first column like this

df[(pd.to_datetime(df[df.columns[0]]) < split_date)]

else you give the column name as "date" or whatever you want

df[(pd.to_datetime(df["date"]) < split_date)]

and lastly

TypeError: <class 'datetime.date'> type object 2008-04-12

This is occurred basically you try this datetime object to the series of df

for more

R.A.Munna
  • 1,699
  • 1
  • 15
  • 29
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/low-quality-posts/17338394) – Lundin Sep 15 '17 at 06:42
  • Some explanation would be nice. – Sergio Tulentsev Sep 15 '17 at 07:55
  • 1
    (@Lundin and @ Sergio),Thanks for good suggestion and I will keep in mind for all next approaches. – R.A.Munna Sep 15 '17 at 10:02
  • This worked but threw: "FutureWarning: Comparing Series of datetimes with 'datetime.date'. Currently, the 'datetime.date' is coerced to a datetime. In the future pandas will not coerce, and a TypeError will be raised. To retain the current behavior, convert the 'datetime.date' to a datetime with 'pd.Timestamp'. split_date = datetime.date(2008,4,12)" – R. Cox Nov 05 '20 at 13:46
1

Here is a solution: Add the label "Date" to the data file for the first column.

import pandas as pd
df = pd.read_csv('data.csv')

split_date ='2008-04-12'
df_training = df.loc[df['Date'] <= split_date]
df_test = df.loc[df['Date'] > split_date]
print df_test

When you do a comparision such as

df.loc[pd.to_datetime(df.index) <= split_date]

both sides must be of same type.

salehinejad
  • 7,258
  • 3
  • 18
  • 26
1

For reference, if you are looking to get the dataframe between two dates you can do this following @R.A.Munna's logic:

import datetime

split_date_one = datetime.date(2019,9,26)
split_date_two = datetime.date(2019,10,13)

df= df[(pd.to_datetime(df[df.columns[0]]) >= split_date_one) & (pd.to_datetime(df[df.columns[0]]) <= split_date_two)]
DataBach
  • 1,330
  • 2
  • 16
  • 31