0

I am performing clustering on some data points which are in various degrees, like low, medium, and high. Is it advisable to convert them into numbers like low-1, med-2, hig-3, and apply k means directly, or should I use any other method?

I performed like this, but it is not always always giving good results. Sometimes it gives very good results but sometimes it does not.

JayPeerachai
  • 3,499
  • 3
  • 14
  • 29
asif
  • 25
  • 1
  • 9

2 Answers2

0

Well converting continuous data to discrete is okay, as long as it's treated as discrete and not continuous. Now k-means works essentially for continuous data only. So I think, a better option will be algorithms like k-prototypes or k-modes. Where k-prototype will work for both continuous and categorical data and k-mode will work only for categorical data.

mnm
  • 1,962
  • 4
  • 19
  • 46
0

K-means does not make much sense on such data.

It's designed for continuous variables. Where the name-giving mean makes sense and minimizes the least squared error.

For categoricial data, use k-medoids or k-modes instead!

Furthermore, you need to carefully consider variable importance.

Note that on categoricial / discrete data it very frequently seems to be a problem that the optimization algorithms get stuck in local optima: because there is no "continuous" path to improve the results. Because of that, the results sometimes are good, and sometimes are bad. You can then increase the number of restarts, but with increasing complexity your chance of lucky guessing decreases...

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • is there any way to make the algorithm work good in all conditions, can you suggent some material where i can look to solve thsi kind of problem?? – asif Mar 25 '19 at 08:25
  • This isn't easy to automate, because it is about your choices; there is no "right" way. – Has QUIT--Anony-Mousse Mar 25 '19 at 19:44
  • i tried k-modes aswell now, but it is not giving good results either ? is there any other way i should try? – asif Mar 26 '19 at 10:51