Understanding resampling of datetime in pandas

Question

I have a question regarding resampling of DataFrames.

import pandas as pd
df = pd.DataFrame([['2005-01-20', 10], ['2005-01-21', 20], 
                   ['2005-01-27', 40], ['2005-01-28', 50]],
                   columns=['date', 'num'])

# Convert the column to datetime 
df['date'] = pd.to_datetime(df['date'])

# Resample and aggregate results by week
df = df.resample('W', on='date')['num'].sum().reset_index()

print(df.head())
# OUTPUT: 
#         date  num
# 0 2005-01-23   30
# 1 2005-01-30   90

Everything works as expected, but I would like to better understand what exactly resample(),['num'] and sum() do here.

QUESTION #1

Why the following happens:

The result of df.resample('W', on='date') is DatetimeIndexResampler.

The result of df.resample('W', on='date')['num'] is pandas.core.groupby.SeriesGroupBy.

The result of df.resample('W', on='date')['num'].sum() is

date
2005-01-23    30
2005-01-30    90
Freq: W-SUN, Name: num, dtype: int64

QUESTION #2

Is there a way to produce the same results without resampling? For example, using groupby.

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

3

Answer1

As the docs says, .resample returns a Resampler Object. Hence you get DatetimeIndexResampler because date is a datetime object.
Now, you get <pandas.core.groupby.SeriesGroupBy because you are looking for Series from the dataframe based of off the Resampler object.

Oh by the way,

df.groupby([pd.Grouper(key='date', freq='W-SUN')])['num'] Would return <pandas.core.groupby.SeriesGroupBy as well.
Now when you do .sum(), you are getting the sum over the requested axis of the dataframe. You get a Series because you are doing sum over the pandas.core.series.Series.

Answer2

You can achieve results using groupby with the help from Grouper as follow:

df.groupby([pd.Grouper(key='date', freq='W-SUN')])['num'].sum()

Output:

date
2005-01-23    30
2005-01-30    90
Name: num, dtype: int64

edited Jun 20 '20 at 09:12

Community

1
1

answered May 23 '18 at 15:56

harvpan

8,571
2
18
36

Thank you for your reply! Do we retrieve columns at step 2 that we previously grouped by `date`? If we retrieve multiple columns, not only `num`, can we use different aggregation methods like `sum()` and `mean()` for different columns? – Konstantin May 24 '18 at 09:59
1

@Konstantin, yes, you can as many as columns as you need. Apply `.sum()` or `.mean()`. For example, if you had some another column like `val`. You get the some using `df.resample('W', on='date')[['num', 'val']].sum()` for both the columns. – harvpan May 24 '18 at 13:53
But is it possible to use `.sum()` for one of them and `.mean()` for another? – Konstantin May 24 '18 at 14:52
1

@Konstantin, absolutely. https://stackoverflow.com/questions/14529838/apply-multiple-functions-to-multiple-groupby-columns – harvpan May 24 '18 at 15:28
@Konstantin Glad to help. Happy coding. – harvpan May 24 '18 at 15:55

Understanding resampling of datetime in pandas

1 Answers1