How to get last four days sale count in particular month and first 27 day's sale count?

Question

How to get each device number sale count and filter who had done a sales more than the first 27 days count(01/Mar/2020 - 27/Mar/2020) < last 4 days count(28/Mar/2020 - 31/Mar/2020).

Device_num      Date        sale_type
2344            1/3/2020      BP
2344            12/3/2020     BP
2344            15/3/2020     BP
2344            20/3/2020     BP
2344            28/3/2020     BP
3455            10/3/2020     Retailer
3455            15/3/2020     Retailer
3455            31/3/2020     Retailer 


output table
Device_num  sale_type   1st_27dayssale  Tot_sale last4dayscount
2344          BP             4            5           1           
3455          Retailer       2            3           1

i need to three columns one is the total sale count groupby Device_num, the second is last four days sale count groupby Device_num and the third one is first 27 days sale count groupby Device_num — Shulaz Shan, Apr 15 '20 at 10:25
ahh sorry @seven_seas. it's stored in CSV format (excel file) — Shulaz Shan, Apr 15 '20 at 11:05
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/211705/discussion-between-shulaz-shan-and-seven-seas). — Shulaz Shan, Apr 15 '20 at 11:06
Can you add the expected output and clarify what you want more precisely ? — Alexandre B., Apr 15 '20 at 16:52
@AlexandreB. I updated, I am expecting the output table, not average i need only sale count — Shulaz Shan, Apr 16 '20 at 02:54
@AlexandreB. small clarification on ur code, I need to get the last three days sales count in each month. How to get that? because each month days count different (i.e if take march 31 days, if take April 30 days) — Shulaz Shan, Mar 02 '21 at 16:02

score 1 · Accepted Answer · answered Apr 16 '20 at 07:20

You can try:

df.groupby(["Device_num", "sale_type", pd.Grouper(key='Date', freq="m")]).apply(
    lambda x: pd.Series({"Device_num":  x.name[0],
                         "sale_type": x.name[1],
                         "1st_27dayssale": (x.Date.dt.day <= 27).sum(),
                         "Tot_sale": len(x),
                         "last4dayscount" : (x.Date.dt.day > 27).sum()})) \
    .reset_index(drop=True)

Explanations:

The dataset is grouped by 3 keys using groupby:
- The Device_num since you want a row for each device num
- The sale_type because we also want a row for each sale_type
- The current month using pd.Grouper. In pd.grouper, the argument key selects the column used in the grouper and the freq stands for the groupby frequency. Here we use m to group by month. For more details, have a look at this topic
Then for each above group, we will want to return the 5 expected columns. pd.apply let use perform a function to each group. We want our function return:
- Device_num: this a key from the group, so we call name to get them. And because it's the first key, we finally use x.name[0]. This discussion explains how to get keys from groupby.
- sale_type: x.name[1] (idem as Device_num)
- 1st_27dayssale: we access the day from the date using x.Date.dt.day and it's compared with 27. Because we actually want the number of values, we perform .sum to count them : (x.Date.dt.day <= 27).sum()
- Tot_sale: the number of row in the group e.g. len(x)
- last4dayscount: (x.Date.dt.day > 27).sum() (idem as 1st_27dayssale)
Using reset_index with drop=True to remove the multi index from the groupby.

Full code:

# Be sure Date is a date type
df["Date"] = pd.to_datetime(df.Date, format="%d/%m/%Y")
print(df)
#    Device_num       Date sale_type
# 0        2344 2020-03-01        BP
# 1        2344 2020-03-12        BP
# 2        2344 2020-03-15        BP
# 3        2344 2020-03-20        BP
# 4        2344 2020-03-28        BP
# 5        3455 2020-03-10  Retailer
# 6        3455 2020-03-15  Retailer
# 7        3455 2020-03-31  Retailer


out = df.groupby(["Device_num", "sale_type", pd.Grouper(key='Date', freq="m")]).apply(
    lambda x: pd.Series({"Device_num":  x.name[0],
                         "sale_type": x.name[1],
                         "1st_27dayssale": (x.Date.dt.day <= 27).sum(),
                         "Tot_sale": len(x),
                         "last4dayscount" : (x.Date.dt.day > 27).sum()})) \
    .reset_index(drop=True)
print(out)
#    Device_num sale_type  1st_27dayssale  Tot_sale  last4dayscount
# 0        2344        BP               4         5               1
# 1        3455  Retailer               2         3               1

i want to get the last three days sales to count on each month (not only march), how to get it? (ex: if u take Feb, last 4 days count(27/Feb/2020 - 29/Feb/2020). — Shulaz Shan, Mar 04 '21 at 04:10
@ShulazShan Please, instead of digging up an old subject, open a new one with what you tried and where you're stuck. — Alexandre B., Mar 04 '21 at 16:05

How to get last four days sale count in particular month and first 27 day's sale count?

1 Answers1