0

I am trying to run machine learning models on Customers trying to segment customers using similar products together. My dataset is huge with 2.4 million records and is in the following format:

customer_id prod_1 prod_2 prod_3 prod_4  ..... prod_10
000           1      0      0      1     .....  1
001           0      0      1      1     .....  1
011           0      1      0      1     .....  0
021           1      0      1      1     .....  0
...

Each row has customer number and 1 or 0 based on whether or not they have a product. I ran k-means and the results did not look impressive.

Any other suggestions on what type of models can be run on such data to segment customers based on the products they use together?

smci
  • 32,567
  • 20
  • 113
  • 146
CuriousKK
  • 35
  • 1
  • 7
  • Check out Assossiation Rules, more specific the `arules` package – cirofdo May 25 '18 at 15:08
  • Which similarity metric did you use with k-means? There are several. – smci Jun 13 '18 at 22:42
  • Related: [Difference between classification and clustering in data mining?](https://stackoverflow.com/questions/5064928/difference-between-classification-and-clustering-in-data-mining) – smci Jun 13 '18 at 22:51

1 Answers1

2

Use frequent itemset mining.

Abandon the idea that each customer belongs to exactly one segment. That doesn't hold in reality.

Instead, there are typical product combinations that identify segments. These can also overlap. One customer can be electronics affine and a Star Wars fan at the same time.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194