I have a slightly odd pandas group by question.
I have a source dataframe, which has three columns: Customer, Date and Item. I want to add a new column that contains Item History, being an array of all the Items for that Customer that are in earlier (defined by the Date) rows. For example given this source dataframe:
Customer Date Item
Bert 01/01/2019 Bread
Bert 15/01/2019 Cheese
Bert 20/01/2019 Apples
Bert 22/01/2019 Pears
Ernie 01/01/2019 Buzz Lightyear
Ernie 15/01/2019 Shellfish
Ernie 20/01/2019 A pet dog
Ernie 22/01/2019 Yoghurt
Steven 01/01/2019 A golden toilet
Steven 15/01/2019 Dominoes
I want to create this history feature:
Customer Date Item Item History
Bert 01/01/2019 Bread NaN
Bert 15/01/2019 Cheese [Bread]
Bert 20/01/2019 Apples [Bread, Cheese]
Bert 22/01/2019 Pears [Bread, Cheese, Apples]
Ernie 01/01/2019 Buzz Lightyear NaN
Ernie 15/01/2019 Shellfish [Buzz Lightyear]
Ernie 20/01/2019 A pet dog [Buzz Lightyear, Shellfish]
Ernie 22/01/2019 Yoghurt [Buzz Lightyear, Shellfish, A pet dog]
Steven 01/01/2019 A golden toilet NaN
Steven 15/01/2019 Dominoes [A golden toilet]
I can do the following to get the History by date:
df.groupby(['Customer', 'Date']).agg(lambda x: tuple(x)).applymap(list).reset_index()
So that where a customer purchased multiple items on a single day, they're all listed in an array and where a customer purchased just one item that's alone in its own array, but I can't figure out how to concatenate them with the earlier rows.