1

I am looking to take the trailing three years (last three years) of my data set. I have merged the year, month and quarter columns into one column named 'date' which is a datetime function. I would like to use his 'date' column in order to get the last 3 years depending on when I run the report.

I thought of using .tail, but my issue is that the number of rows will change and therefore wanted to do filter based on the date column.

In my code below you will see I am filtering based on year, but really i need to be more precise and filter on an exact 3 year basis- For example, if i ran the report in 2 months from now, it would give me exactly 3 years back.

df1= df.loc[df['YEAR'].isin(['2014','2015','2016'])]
monthend = {'Q1':'3/31','Q2':'6/30','Q3':'9/30','Q4':'12/31'}
df1['Date']=df1['QUARTER'].map(monthend)+'/'+ df1['YEAR']
df1['Date'] = pd.to_datetime(df1['Date'])

Dataframe looks something like this:

YEAR    MONTH   QUARTER  Fruit Sales ($)    Date
2014      2       1         12           3/31/2014
2014      4       2         12           6/30/2014
2014      9       3         13           9/30/2014
2014      12      4         11           12/31/2014
2015      1       1         3            3/31/2015
2015      4       2         54           6/30/2015
2015      8       3         11           9/30/2015
2015      11      4         43           12/31/2015
2016      3       1         54           3/31/2016
2016      5       2         34           6/30/2016
2016      7       3         6            9/30/2016
2016      10      4         7            12/31/2016
2017      1       1         57           3/31/2017

Any ideas?

spacedinosaur10
  • 695
  • 3
  • 10
  • 24
  • 1
    so what would be an output example based on that dataframe? please update post. – Zeugma Feb 06 '17 at 21:05
  • If you strictly have quaterly data, doesn't it make sense to simplify it by selecting last 12 rows(4 in a year)? – Vaishali Feb 06 '17 at 21:21
  • @Boud The dataframe above is my desired output. My original dataframe has data ranging from 2002-2017. – spacedinosaur10 Feb 06 '17 at 22:00
  • @Vaishali Garg The problem with that is i have some dates with more than one row. So it could be that i have 2 rows from Q1 2014 and three rows for Q1 2014 which i would still like to account for. – spacedinosaur10 Feb 06 '17 at 22:01

1 Answers1

3

I think we can do it with saving the (current time - 3 years) as a variable and then filtering the DataFrame with that.

so

import time
then = datetime.datetime.now() - datetime.timedelta(days=3*365)
then = pd.to_datetime(then)

now back to our dataframe

df2 = df1[(df1['Date'] > then)]

I am sure there is a more elegant solution but I think this can get the job done!

Here are the sources I used:

pandas apply filter for boolean type

filtering pandas dataframes on dates

Subtract an year from a datetime column in pandas

Community
  • 1
  • 1
Dylan
  • 417
  • 4
  • 14