An extension to my previous question. I have a source dataframe, which has three columns: Customer, Date and Item. I want to add a new column that contains Item History, being an array of all the Items for that Customer that are in earlier (defined by the Date) rows. Where a customer has made multiple purchases on the same date, neither row's item should be listed in the item history for the other.
So, given this sample data:
df = pd.DataFrame({'Customer':['Bert', 'Bert', 'Bert', 'Bert', 'Bert', 'Ernie', 'Ernie', 'Ernie', 'Ernie', 'Steven', 'Steven'], 'Date':['01/01/2019', '15/01/2019', '20/01/2019', '20/01/2019', '22/01/2019', '01/01/2019', '15/01/2019', '20/01/2019', '22/01/2019', '01/01/2019' ,'15/01/2019'], 'Item':['Bread', 'Cheese', 'Apples', 'Pears', 'Toothbrush', 'Toys', 'Shellfish', 'Dog', 'Yoghurt', 'Toilet', 'Dominos']})
Customer Date Item
Bert 01/01/2019 Bread
Bert 15/01/2019 Cheese
Bert 20/01/2019 Apples
Bert 20/01/2019 Pears
Bert 22/01/2019 Toothbrush
Ernie 01/01/2019 Toys
Ernie 15/01/2019 Shellfish
Ernie 20/01/2019 Dog
Ernie 22/01/2019 Yoghurt
Steven 01/01/2019 Toilet
Steven 15/01/2019 Dominos
The output I'd like to see would be:
Customer Date Item Item History
Bert 01/01/2019 Bread NaN
Bert 15/01/2019 Cheese [Bread]
Bert 20/01/2019 Apples [Bread, Cheese]
Bert 20/01/2019 Pears [Bread, Cheese]
Bert 22/01/2019 Toothbrush [Bread, Cheese, Apples, Pears]
Ernie 01/01/2019 Toys NaN
Ernie 15/01/2019 Shellfish [Toys]
Ernie 20/01/2019 Dog [Toys, Shellfish]
Ernie 22/01/2019 Yoghurt [Toys, Shellfish, Dog]
Steven 01/01/2019 Toilet NaN
Steven 15/01/2019 Dominos [Toilet]
Note that for Bert's purchases on 20/01/2019, neither's History column contains the other's item. For his 22/01/2019 purchase, both of the items from 20/01/2019 are included.
The answer to the previous question is a nifty bit of list comprehension, in the form:
df['Item History'] = [x.Item[:i].tolist() for j, x in df.groupby('Customer')
for i in range(len(x))]
df.loc[~df['Item History'].astype(bool), 'Item History']= np.nan
But obviously "i" in the x.Item[:i]
needs to work out the last row where the Date was not the same as the current row. Any advice on achieving that is much appreciated.