I have a dataframe like this:
index customerID item_tag orderID Amount
0 23 A 1 34.50
1 55 B 2 11.22
2 23 A 3 9.34
3 55 D 4 123.44
4 55 F 5 231.40
I also have a list containing item_tags like this:
my_list = ['A', 'B', 'D']
Now I want to check how many item types from the my_list
were ordered by each customer.
For example, for customer 23 this number would be = 1, as customer 23 order only item tagged as A, but not B nor D.
Customer 55, however, ordered item B and D, so this indicator variable would be 2, as two item types from the my_list
are present in his orders. (he also ordered item F, but this item is not in the my_list
).
So far I tried groupby([customerId, item_tag], as_index = False).count()
, but this requires creating new dataframe (maybe not necessarily?) and then using if statement for each element in a list, but I suspect there is a more elegant way. I could not find any though, neither in Google nor here.
My dataframe has million of rows so I am looking for most efficient solution.
As a result, I want dataframe like this:
index customerID if_A if_B if_D sum_in_list
0 23 1 0 0 1
1 55 0 1 1 2