0

So I have the following table in the pandas tracking dollar sales for vegetables per week

datetime     | vegetable | sales (amount sold)
2020-01-06.   carrot.     100
2020-01-13.   carrot.     200
2020-01-20.   carrot.     450
.
.
.
2020-03-23.   carrot.     250
2020-01-06.   onion.      40
2020-01-13.   onion.      80
2020-01-20.   onion.      10
.
.
.
2020-04-27.   onion.      60
2020-01-06.   lettuce     40
2020-01-13.   lettuce     0
2020-01-20.   lettuce.    0
2020-01-28.   lettuce.    0
.
.
.
2020-03-16.   lettuce     50

Is it possible to group by vegetable and filter out the vegetables where a particular vegetable was sold 0 times 1 - 2 weeks within a one month (4 week) time window. So for example lettuce was not sold 3 out of 4 of the weeks in January and thus should not be in the final table. Would I need an extra column indicator month number or week number (or both) for the filter or is it possible by iterating through the ds's in a vegetable group by and going 4 datestamps at a time?

SooWoo
  • 71
  • 1
  • 6
  • You can use `groupby` and `sum` (or `agg`)- see also [this question](https://stackoverflow.com/questions/39922986/pandas-group-by-and-sum). – Gerd May 11 '20 at 13:34

1 Answers1

0

try this:

import pandas as pd

df = pd.DataFrame(
    data=[
        ['2020-01-06', 'carrot', 100],
        ['2020-01-13', 'carrot', 200],
        ['2020-01-20', 'carrot', 450],
        ['2020-03-23', 'carrot', 250],
        ['2020-01-06', 'onion', 40],
        ['2020-01-13', 'onion', 80],
        ['2020-01-20', 'onion', 10],
        ['2020-04-27', 'onion', 60],
        ['2020-01-06', 'lettuce', 40],
        ['2020-01-13', 'lettuce', 0],
        ['2020-01-20', 'lettuce', 0],
        ['2020-01-28', 'lettuce', 0],
        ['2020-03-16', 'lettuce', 50, ]],
    columns=['datetime', 'vegetable', 'sales']
)

df['datetime'] = pd.to_datetime(df.datetime)
df['datetime_month'] = df.datetime.map(lambda x: x.strftime('%Y-%m'))
df['has_sales'] = df.sales.apply(lambda x: 1 if x > 0 else 0)

month_df = df.groupby(
    ['datetime_month', 'vegetable']
).agg(n_weeks_w_sales=('has_sales', 'sum')).reset_index()

month_df = month_df[month_df.n_weeks_w_sales > 2]

df.merge(
    right=month_df,
    on=['datetime_month', 'vegetable'],
    how='inner'
).drop(columns=['has_sales', 'n_weeks_w_sales', 'datetime_month'])
kait
  • 1,327
  • 9
  • 13