You can try:
df.groupby(["Device_num", "sale_type", pd.Grouper(key='Date', freq="m")]).apply(
lambda x: pd.Series({"Device_num": x.name[0],
"sale_type": x.name[1],
"1st_27dayssale": (x.Date.dt.day <= 27).sum(),
"Tot_sale": len(x),
"last4dayscount" : (x.Date.dt.day > 27).sum()})) \
.reset_index(drop=True)
Explanations:
The dataset is grouped by 3 keys using groupby
:
- The
Device_num
since you want a row for each device num
- The
sale_type
because we also want a row for each sale_type
- The current month using
pd.Grouper
. In pd.grouper
, the argument key
selects the column used in the grouper and the freq
stands for the groupby frequency
. Here we use m
to group by month. For more details, have a look at this topic
Then for each above group, we will want to return the 5 expected columns. pd.apply
let use perform a function to each group. We want our function return:
Device_num
: this a key from the group, so we call name
to get them. And because it's the first key, we finally use x.name[0]
. This discussion explains how to get keys from groupby.
sale_type
: x.name[1]
(idem as Device_num
)
1st_27dayssale
: we access the day from the date using x.Date.dt.day
and it's compared with 27. Because we actually want the number of values, we perform .sum
to count them : (x.Date.dt.day <= 27).sum()
Tot_sale
: the number of row in the group e.g. len(x)
last4dayscount
: (x.Date.dt.day > 27).sum()
(idem as 1st_27dayssale
)
Using reset_index
with drop=True
to remove the multi index from the groupby.
Full code:
# Be sure Date is a date type
df["Date"] = pd.to_datetime(df.Date, format="%d/%m/%Y")
print(df)
# Device_num Date sale_type
# 0 2344 2020-03-01 BP
# 1 2344 2020-03-12 BP
# 2 2344 2020-03-15 BP
# 3 2344 2020-03-20 BP
# 4 2344 2020-03-28 BP
# 5 3455 2020-03-10 Retailer
# 6 3455 2020-03-15 Retailer
# 7 3455 2020-03-31 Retailer
out = df.groupby(["Device_num", "sale_type", pd.Grouper(key='Date', freq="m")]).apply(
lambda x: pd.Series({"Device_num": x.name[0],
"sale_type": x.name[1],
"1st_27dayssale": (x.Date.dt.day <= 27).sum(),
"Tot_sale": len(x),
"last4dayscount" : (x.Date.dt.day > 27).sum()})) \
.reset_index(drop=True)
print(out)
# Device_num sale_type 1st_27dayssale Tot_sale last4dayscount
# 0 2344 BP 4 5 1
# 1 3455 Retailer 2 3 1