0

I have a large dataset with a date column (which is not the index) with the following format %Y-%m-%d %H:%M:%S.

I would like to create quarterly subsets of this data frame i.e. the data frame dfQ1 would contain all rows where the date was between month [1 and 4], dfQ2 would contain all rows where the date was between month [5 and 8], etc... The header of the subsets is the same as that of the main data frame.

How can I do this?

Thanks!

finstats
  • 1,349
  • 4
  • 19
  • 31

3 Answers3

1

I would add a new column containing quarterly information, ie:

from datetime import datetime
date_format = "%Y-%m-%d %H:%M:%S"
date_to_qtr = lambda dt: 1 + (datetime.strptime(dt, date_format).month-1) // 3
df['qtr'] = df['date'].apply(date_to_qtr)

(using the floordiv function). Then index on the new column:

dfQ1 = df[df.qtr == 1]
dfQ2 = df[df.qtr == 2]
dfQ3 = df[df.qtr == 3]
dfQ4 = df[df.qtr == 4]

Or, by then you can just use groupby, df.groupby("qtr") (see docs).

Community
  • 1
  • 1
metaperture
  • 2,393
  • 1
  • 18
  • 19
  • This method worked, but I think the quarters are a bit off. For example, dfQ2 contains month 3, where it should only contain months 4,5,6. Same thing for other quarters. How can I fix this? – finstats Oct 13 '14 at 13:30
  • Sorry, had an off by one (floordiv likes 0-indexed, months and quarters are 1-index). Should be fixed above. – metaperture Oct 13 '14 at 16:25
  • Thanks! this worked perfectly :). Can you explain though what the `//3` is? If i also wanted to include a "month" column and a "week" column similar to "qtr" how can I adjust the `date_to_qtr` variable? – finstats Oct 13 '14 at 18:20
1

Using pandas, you can first create a datetime column and then create a quarter column using the date/time quarter attribute:

from datetime import datetime
date_format = "%Y-%m-%d %H:%M:%S"
df['datetime'] = [datetime.strptime(dt, date_format) for dt in df['date']]
df['quarter'] = [dt.quarter for dt in df['datetime']]

From there you can subset the dataframe with groupby (df.groupby('quarter')) or by indexing:

dfQ1 = df[df.quarter == 1]
dfQ2 = df[df.quarter == 2]
dfQ3 = df[df.quarter == 3]
dfQ4 = df[df.quarter == 4]
greenren
  • 26
  • 4
  • This is more straight forward than the answer given by @metaperture. Thanks a lot, i was able to easily do this for months and weeks as well. :) – finstats Oct 17 '14 at 09:25
0

Assuming you're using Pandas.

dfQ1 = df[(df.date > Qstartdate) & (df.date < Qenddate)]

Bob Haffner
  • 8,235
  • 1
  • 36
  • 43
  • I tried putting `2013-01-01` as my Qstartdate and `2013-04-01` as my Qenddate, but i got the following error `TypeError: can't compare datetime.datetime to int`. How should i write these parameters? Sorry I am new to Python and Pandas, and I really appreciate your help. – finstats Oct 13 '14 at 13:09
  • hmmm, whats the data type of your date in your df? You can do df.dtypes to find that out. – Bob Haffner Oct 13 '14 at 15:24
  • it says type `object`.I guess its a string. I tried changing it to a date using strptime but I still had a problem. I guess i'm doing something wrong. @metaperture's method worked well though. – finstats Oct 13 '14 at 18:17
  • Ok, glad you're up and running – Bob Haffner Oct 13 '14 at 18:45