I have a pandas dataframe like following
buyer_id item_id order_id date qty_purchased qty_bought
210 82 470 2016-01-02 5 1
169 57 475 2016-01-02 5 1
169 83 475 2016-01-02 5 1
228 82 520 2016-01-03 4 1
228 86 520 2016-01-03 4 1
228 90 520 2016-01-03 4 1
229 57 521 2016-01-03 4 1
232 82 525 2016-01-04 4 3
210 90 526 2016-01-04 4 1
210 91 526 2016-01-04 5 1
210 15 526 2016-01-05 5 1
233 90 527 2016-01-05 4 1
I I want to find item_id
introduced on each date and if item_id is introduced on more than 1 date then I want to find the ratio of `(qty_bought/qty_purchased) on each date.
My desired output is as follows
Item_id date 1st Introduced Ratio Date 2nd Introduced Ratio Date 3rd Introduced Ratio Flag
82 2016-01-02 1/5 2016-01-03 1/4 2016-01-04 3/4 1
Condition for a flag is when ratio is greater than previous dates then it should be set to 1 or else 0
If I have item introduced on 5 different dates then this should dynamically generate 5 dates and ratio columns. Ratio will be specific to that date. i want to list only those item_id
which has introduced more than once.
This is my attempt in python
df.groupby('item_id')['date'].apply(lambda x: np.unique(x.tolist()))
this gives me list of item_id
and the dates they were introduced on. Now I want to subset above if the item has introduced on more than 1 date.
df.groupby('item_id').apply(lambda r: r['date'].unique().shape[0] > 1)
This gives me all the item_id
introduced on more than 1 date. But I didn't get how to make a dataframe with the desired output and how to add date & ratio
column dynamically depending upon no of dates they were introduced on. Please help