0

How to get each device number sale count and filter who had done a sales more than the first 27 days count(01/Mar/2020 - 27/Mar/2020) < last 4 days count(28/Mar/2020 - 31/Mar/2020).

Device_num      Date        sale_type
2344            1/3/2020      BP
2344            12/3/2020     BP
2344            15/3/2020     BP
2344            20/3/2020     BP
2344            28/3/2020     BP
3455            10/3/2020     Retailer
3455            15/3/2020     Retailer
3455            31/3/2020     Retailer 


output table
Device_num  sale_type   1st_27dayssale  Tot_sale last4dayscount
2344          BP             4            5           1           
3455          Retailer       2            3           1
Shulaz Shan
  • 175
  • 8

1 Answers1

1

You can try:

df.groupby(["Device_num", "sale_type", pd.Grouper(key='Date', freq="m")]).apply(
    lambda x: pd.Series({"Device_num":  x.name[0],
                         "sale_type": x.name[1],
                         "1st_27dayssale": (x.Date.dt.day <= 27).sum(),
                         "Tot_sale": len(x),
                         "last4dayscount" : (x.Date.dt.day > 27).sum()})) \
    .reset_index(drop=True)

Explanations:

  1. The dataset is grouped by 3 keys using groupby:

    • The Device_num since you want a row for each device num
    • The sale_type because we also want a row for each sale_type
    • The current month using pd.Grouper. In pd.grouper, the argument key selects the column used in the grouper and the freq stands for the groupby frequency. Here we use m to group by month. For more details, have a look at this topic
  2. Then for each above group, we will want to return the 5 expected columns. pd.apply let use perform a function to each group. We want our function return:

    • Device_num: this a key from the group, so we call name to get them. And because it's the first key, we finally use x.name[0]. This discussion explains how to get keys from groupby.
    • sale_type: x.name[1] (idem as Device_num)
    • 1st_27dayssale: we access the day from the date using x.Date.dt.day and it's compared with 27. Because we actually want the number of values, we perform .sum to count them : (x.Date.dt.day <= 27).sum()
    • Tot_sale: the number of row in the group e.g. len(x)
    • last4dayscount: (x.Date.dt.day > 27).sum() (idem as 1st_27dayssale)
  3. Using reset_index with drop=True to remove the multi index from the groupby.

Full code:

# Be sure Date is a date type
df["Date"] = pd.to_datetime(df.Date, format="%d/%m/%Y")
print(df)
#    Device_num       Date sale_type
# 0        2344 2020-03-01        BP
# 1        2344 2020-03-12        BP
# 2        2344 2020-03-15        BP
# 3        2344 2020-03-20        BP
# 4        2344 2020-03-28        BP
# 5        3455 2020-03-10  Retailer
# 6        3455 2020-03-15  Retailer
# 7        3455 2020-03-31  Retailer


out = df.groupby(["Device_num", "sale_type", pd.Grouper(key='Date', freq="m")]).apply(
    lambda x: pd.Series({"Device_num":  x.name[0],
                         "sale_type": x.name[1],
                         "1st_27dayssale": (x.Date.dt.day <= 27).sum(),
                         "Tot_sale": len(x),
                         "last4dayscount" : (x.Date.dt.day > 27).sum()})) \
    .reset_index(drop=True)
print(out)
#    Device_num sale_type  1st_27dayssale  Tot_sale  last4dayscount
# 0        2344        BP               4         5               1
# 1        3455  Retailer               2         3               1
Alexandre B.
  • 5,387
  • 2
  • 17
  • 40
  • i want to get the last three days sales to count on each month (not only march), how to get it? (ex: if u take Feb, last 4 days count(27/Feb/2020 - 29/Feb/2020). – Shulaz Shan Mar 04 '21 at 04:10
  • @ShulazShan Please, instead of digging up an old subject, open a new one with what you tried and where you're stuck. – Alexandre B. Mar 04 '21 at 16:05