7

I have a pandas dataframe where the index is the date, from year 2007 to 2017.

I'd like to calculate the mean of each weekday for each year. I am able to group by year:

groups = df.groupby(TimeGrouper('A'))
years = DataFrame()
for name, group in groups:
   years[name.year] = group.values

This is the way I create a new dataframe (years) where in each column I obtain each year of the time series. If I want to see the statistics of each years (for example, the mean):

print(years.mean())

But now I would like to separate each day of the week for each year, in order to obtain the mean of each weekday for all of then.

The only thing I know is:

year=df[(df.index.year==2007)]

day_week=df[(df.index.weekday==2)]

The problem with this is that I have to change 7 times the day of the week, and then repeat this for 11 years (my time series begins on 2007 and ends on 2017), so I must do it 77 times!

Is there a way to group time by years and weekday in order to make this faster?

Jvr
  • 563
  • 1
  • 5
  • 15

1 Answers1

4

It seems you need groupby by DatetimeIndex.year with DatetimeIndex.weekday:

rng = pd.date_range('2017-04-03', periods=10, freq='10M')
df = pd.DataFrame({'a': range(10)}, index=rng)  
print (df)
            a
2017-04-30  0
2018-02-28  1
2018-12-31  2
2019-10-31  3
2020-08-31  4
2021-06-30  5
2022-04-30  6
2023-02-28  7
2023-12-31  8
2024-10-31  9

df1 = df.groupby([df.index.year, df.index.weekday]).mean()
print (df1)
        a
2017 6  0
2018 0  2
     2  1
2019 3  3
2020 0  4
2021 2  5
2022 5  6
2023 1  7
     6  8
2024 3  9

df1 = df.groupby([df.index.year, df.index.weekday]).mean().reset_index()
df1 = df1.rename(columns={'level_0':'years','level_1':'weekdays'})
print (df1)
   years  weekdays  a
0   2017         6  0
1   2018         0  2
2   2018         2  1
3   2019         3  3
4   2020         0  4
5   2021         2  5
6   2022         5  6
7   2023         1  7
8   2023         6  8
9   2024         3  9
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • As I said, it's just what I wanted to do. Clear and useful answer. – Jvr Jun 13 '17 at 12:13
  • How would you write this using Grouper? – Niels Bom Jun 01 '21 at 10:11
  • 1
    @NielsBom - I think not possible, because `df.groupby([pd.Grouper(freq='A'), pd.Grouper(freq='W')]).mean()` seems like solution, but for `week`s are returned periods of each weeks instead `0-7` values, so failed. And cannot find altrenative in [this](https://pandas.pydata.org/docs/user_guide/timeseries.html#dateoffset-objects) for it. – jezrael Jun 01 '21 at 10:23