1

With grouped data I mean the following: Assume we have a data set which is grouped by a single feature, e.g. customer data, which is grouped by the single customer:

Customer | Purchase Nr | Item          | Paid Amount ($)
1          1             TShirt          15
1          2             Trousers        25
1          3             Scarf           10
2          1             Underwear       5
2          2             Dress           35
2          3             Trousers        30
2          4             TShirt          10
3          1             TShirt          8
3          2             Socks           5
4          1             Shorts          13

I want to find clusters in a way, that a customers purchases are in one single cluster, in other words, that that a customer is not appearing in two clusters.

I thought about grouping the data set by the customer with a groupby, though it is difficult to express all the information of the columns for one customer in only one column. Futher, the order of purchases is important to me, e.g. if a T-Shirt was bought first or second.

Is there any cluster algorithm which includes information about groups like this?

Thank you!

  • refer : https://stackoverflow.com/questions/22219004/grouping-rows-in-list-in-pandas-groupby – Nilanka Manoj Mar 05 '20 at 09:26
  • @NilankaManoj that a good point, though afterwards I want to do clustering with the rows which then include lists. K-Means is obviouly not working then. What cluster algorithm do you propose then? – Tanja Pfaffel Mar 05 '20 at 09:34
  • Hey @TanjaPfaffel, what did you end up doing at the end? I'm facing the same scenario... – JohnDoe_Scientist Mar 25 '22 at 21:27

0 Answers0