1

I use the fuzzy-c-means clustering implementation and I would like the data X to form the number of clusters i define in the algorithm(I beleive that is how it works). But the behavior is confusing.

cm = FCM(n_clusters=6)
cm.fit(X)

This code generates a plot with 4 labels - [0,2,4,6]

cm = FCM(n_clusters=4)
cm.fit(X)

This code generates a plot with 4 labels - [0,1,2,3]

I expect labels [0,1,2,3,4,5] when i initialize the cluster number to be 6.

code:

from fcmeans import FCM
from matplotlib import pyplot as plt
from seaborn import scatterplot as scatter

# fit the fuzzy-c-means
fcm = FCM(n_clusters=6)
fcm.fit(X)

# outputs
fcm_centers = fcm.centers
fcm_labels  = fcm.u.argmax(axis=1)

# plot result
%matplotlib inline
f, axes = plt.subplots(1, 2, figsize=(11,5))
scatter(X[:,0], X[:,1], ax=axes[0])
scatter(X[:,0], X[:,1], ax=axes[1], hue=fcm_labels)
scatter(fcm_centers[:,0], fcm_centers[:,1], ax=axes[1],marker="s",s=200)
plt.show()
desertnaut
  • 57,590
  • 26
  • 140
  • 166
hakuna_code
  • 783
  • 7
  • 16
  • Please include details of which implementation you are using and the relevant imports - if it is not `skfuzzy`, please remove the (added by myself) tag – desertnaut Jul 16 '19 at 11:31
  • It is an implementation from fcmeans. But the problem is not with the implementation, even with skfuzzy I see the same behavior. – hakuna_code Jul 16 '19 at 11:35
  • We always need the implementation, in order to try to *reproduce* the behavior... – desertnaut Jul 16 '19 at 11:45
  • Thanks! added the whole code used, and the implementation is fcmeans – hakuna_code Jul 16 '19 at 11:49
  • But i mainly need to understand this cluster results. In general, we can expect the number of clusters of defined(in my case n=6) as the outputted cluster right? Or will the output vary from the given number of clusters? – hakuna_code Jul 16 '19 at 12:13
  • 1
    I would *suspect* that, if the algorithm cannot find "enough" clusters, it does not respect the `n_clusters` argument, and it effectively treats it as the max number of clusters to search for; but that is just my suspicion, I would like to experiment myself, but unfortunately I cannot install the package (`pip` does not seem to work). – desertnaut Jul 16 '19 at 12:17
  • Thanks thats my suspiction too! pip install fuzzy-c-means worked for me though! – hakuna_code Jul 16 '19 at 12:24

3 Answers3

0

Fuzzy c-means is a fuzzy clustering algorithm.

The labels are only an approximation to the fuzzy assignment.

Most likely two clusters are pretty weak, and hence never win the argmax operation used to produce the labels. That doesn't mean these clusters have not been used, you are just not using the full fuzzy result.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
0

I'm using fuzzy-c-means version 1.7.0:

>>> import fcmeans
>>> fcmeans.__version__
'1.7.0'

Using synthetic data:

>>> from sklearn.datasets import load_iris
>>> iris = load_iris().data
>>> model = fcmeans.FCM(n_clusters = 2)
>>> model.fit(iris)
>>> pred = model.predict(iris)
>>> from collections import Counter
>>> Counter(pred)
Counter({0: 97, 1: 53})

So, the n_clusters applied correctly.

Shayan
  • 5,165
  • 4
  • 16
  • 45
-2

I read about it and looks like once the algorithm reaches the knee point(max number of clusters it can perform with the data), it wont create anything more than this. So in my question, 4 was the maximum number of clusters that the algo perform with the given dataset.

hakuna_code
  • 783
  • 7
  • 16
  • No. It does not use a "knee" point. There is no reliable mathematical definition of this that it could even use. You're misinterpreting the result by using `argmax`. – Has QUIT--Anony-Mousse Jul 24 '19 at 19:12