After review, a much simpler solution can be used. Please refer to subsection labeled New Solution in Part 1 below.
This task requires multiple steps. Let's break it down as follows:
Part 1: Transform Date & Resample
New Solution
With consideration that the weekly frequency required, being Sunday based (i.e. freq='W-SUN'
) is independent for each month and is not related to or affected by any adjacent month(s), we can directly use the year-month values in column Date
to generate date ranges in weekly basis in one step rather than breaking into 2 steps by first generating daily date ranges from year-month and then resample the daily date ranges to weekly afterwards.
The new program logics just needs to use pd.date_range()
with freq='W'
with the help of pd.offsets.MonthEnd()
to generate weekly frequency for a month. Altogether, it does not need to call .resample()
or .asfreq()
like other solutions. Effectively, the pd.date_range()
with freq='W'
is doing the resampling task for us.
Here goes the codes:
df['Weeks'] = df['Date'].map(lambda x:
pd.date_range(
start=pd.to_datetime(x),
end=(pd.to_datetime(x) + pd.offsets.MonthEnd()),
freq='W'))
df = df.explode('Weeks')
Result:
print(df)
ID Date Volume Sales Weeks
0 1 2020-02 10 4 2020-02-02
0 1 2020-02 10 4 2020-02-09
0 1 2020-02 10 4 2020-02-16
0 1 2020-02 10 4 2020-02-23
1 1 2020-03 8 6 2020-03-01
1 1 2020-03 8 6 2020-03-08
1 1 2020-03 8 6 2020-03-15
1 1 2020-03 8 6 2020-03-22
1 1 2020-03 8 6 2020-03-29
2 2 2020-02 6 8 2020-02-02
2 2 2020-02 6 8 2020-02-09
2 2 2020-02 6 8 2020-02-16
2 2 2020-02 6 8 2020-02-23
3 2 2020-03 4 10 2020-03-01
3 2 2020-03 4 10 2020-03-08
3 2 2020-03 4 10 2020-03-15
3 2 2020-03 4 10 2020-03-22
3 2 2020-03 4 10 2020-03-29
By the 2 lines of codes above, we already get the required result for Part 1. We don't need to go through the complicated codes of .groupby()
and .resample()
in the old solution.
We can continue to go to Part 2. As we have not created the grouped
object, we can either replace grouped
by df
in for the codes in Part 2 or add a new line grouped = df
to continue.
Old Solution
We use pd.date_range()
with freq='D'
with the help of pd.offsets.MonthEnd()
to produce daily entries for the full month. Then transform these full month ranges to index before resampling to week frequency. Resampled with closed='left'
to exclude the unwanted week of 2020-04-05 produced under default resample()
parameters.
df['Weeks'] = df['Date'].map(lambda x:
pd.date_range(
start=pd.to_datetime(x),
end=(pd.to_datetime(x) + pd.offsets.MonthEnd()),
freq='D'))
df = df.explode('Weeks').set_index('Weeks')
grouped = (df.groupby(['ID', 'Date'], as_index=False)
.resample('W', closed='left')
.ffill().dropna().reset_index(-1))
Result:
print(grouped)
Weeks ID Date Volume Sales
0 2020-02-02 1.0 2020-02 10.0 4.0
0 2020-02-09 1.0 2020-02 10.0 4.0
0 2020-02-16 1.0 2020-02 10.0 4.0
0 2020-02-23 1.0 2020-02 10.0 4.0
1 2020-03-01 1.0 2020-03 8.0 6.0
1 2020-03-08 1.0 2020-03 8.0 6.0
1 2020-03-15 1.0 2020-03 8.0 6.0
1 2020-03-22 1.0 2020-03 8.0 6.0
1 2020-03-29 1.0 2020-03 8.0 6.0
2 2020-02-02 2.0 2020-02 6.0 8.0
2 2020-02-09 2.0 2020-02 6.0 8.0
2 2020-02-16 2.0 2020-02 6.0 8.0
2 2020-02-23 2.0 2020-02 6.0 8.0
3 2020-03-01 2.0 2020-03 4.0 10.0
3 2020-03-08 2.0 2020-03 4.0 10.0
3 2020-03-15 2.0 2020-03 4.0 10.0
3 2020-03-22 2.0 2020-03 4.0 10.0
3 2020-03-29 2.0 2020-03 4.0 10.0
Here, we retain the column Date
for some use later.
Part 2: Divide Volume and Sales by number of weeks in month
Here, the number of weeks in month used to divide the Volume and Sales figures should actually be the number of resampled weeks within the month as shown in the interim result above.
If we use the actual number of weeks, then for Feb 2020, because of leap year, it has 29 days in that month and thus it actually spans across 5 weeks instead of the 4 resampled weeks in the interim result above. Then it would cause inconsistent results because there are only 4 week entries above while we divide each Volume and Sales figure by 5.
Let's go to the codes then:
We group by columns ID
and Date
and then divide each value in columns Volume
and Sales
by group size (i.e. number of resampled weeks).
grouped[['Volume', 'Sales']] = (grouped.groupby(['ID', 'Date'])[['Volume', 'Sales']]
.transform(lambda x: x / x.count()))
or simplified form using /=
as follows:
grouped[['Volume', 'Sales']] /= (grouped.groupby(['ID', 'Date'])[['Volume', 'Sales']]
.transform('count'))
Result:
print(grouped)
Weeks ID Date Volume Sales
0 2020-02-02 1.0 2020-02 2.5 1.0
0 2020-02-09 1.0 2020-02 2.5 1.0
0 2020-02-16 1.0 2020-02 2.5 1.0
0 2020-02-23 1.0 2020-02 2.5 1.0
1 2020-03-01 1.0 2020-03 1.6 1.2
1 2020-03-08 1.0 2020-03 1.6 1.2
1 2020-03-15 1.0 2020-03 1.6 1.2
1 2020-03-22 1.0 2020-03 1.6 1.2
1 2020-03-29 1.0 2020-03 1.6 1.2
2 2020-02-02 2.0 2020-02 1.5 2.0
2 2020-02-09 2.0 2020-02 1.5 2.0
2 2020-02-16 2.0 2020-02 1.5 2.0
2 2020-02-23 2.0 2020-02 1.5 2.0
3 2020-03-01 2.0 2020-03 0.8 2.0
3 2020-03-08 2.0 2020-03 0.8 2.0
3 2020-03-15 2.0 2020-03 0.8 2.0
3 2020-03-22 2.0 2020-03 0.8 2.0
3 2020-03-29 2.0 2020-03 0.8 2.0
Optionally, you can do some cosmetic works to drop the column Date
and rearrange column Weeks
to your desired position if you like.
Edit: (Similarity and difference from other questions resampling from month to week)
In this review, I have searched some other questions of similar titles and compared the questions and solutions.
There is another question with similar requirement to split the monthly values equally to weekly values according to the number of weeks in the resampled month. In that question, the months are represented as the first date of the months and they are in datetime format and used as index in the dataframe while in this question, the months are represented as YYYY-MM
which can be of string type.
A big and critical difference is that in that question, the last month period index 2018-05-01 with value 22644 was actually not processed. That is, the month of 2018-05 is not resampled into weeks in May 2018 and the value 22644 has never been processed to split into weekly proportions. The accepted solution using .asfreq()
does not show any entry for 2018-05 at all and the other solution using .resample()
still keeps one (un-resampled) entry for 2018-05 and the value 22644 is not split into weekly proportions.
However, in our question here, the last month listed in each group still needs to be resampled into weeks and values split equally for the resampled weeks.
Looking at the solution, my new solution makes no call to .resample()
nor .asfreq()
. It just uses pd.date_range()
with freq='W'
with the help of pd.offsets.MonthEnd()
to generate weekly frequency for a month based on 'YYYY-MM' values. This is what I could not imagine of when I worked on the old solution making use of .resample()