-1

In calculating the XMeans clustering solution of a dataset, it is necessary (in the algorithm description) to seed the centers properly.

In WEKA Xmeans, there is an option to specify the initial centers. Additionally, in other Xmeans libraries, the user often has to provide an initial set of centers.

However, there is no indication whether or what the WEKA xmeans library does to create the initial centers if none are provided.



How does WEKA produce initial centers if none are provided? Or, is it necessary to generate the initial centers yourself in order to run the Xmeans algorithm properly?

Chris
  • 28,822
  • 27
  • 83
  • 158

1 Answers1

1

You cannot use predefined centers with x-means.

Because it recursively works on subsets.

You could define the initial kmin (usually 2) centers. But you cannot predefine what happen after that, and the whole purpose of xmeans is to not have to know k beforehand. If you predefine k centers, you do assume this is the right k.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • This doesn't seem correct. Can you clarify? The reason I say this is because xmeans extends kmeans, and kmeans requires pre chosen centers. The methodology of this process is important. Additionally, with respect to xmeans in weka and pyclustering, i can provide centers in the former, and must provide centers in the latter. – Chris Aug 19 '17 at 21:58
  • So yes, it does work recursively on subsets... of an initial kmeans. – Chris Aug 19 '17 at 21:59
  • 1
    That initial kmeans has k=2 centers, and the main assumption is that you do *not* know k yet. – Has QUIT--Anony-Mousse Aug 19 '17 at 22:06
  • Right, so if I have a lower bound on the number of clusters (which I can specify in weka's xmeans), gen presumably I run a kmeans with a cluster count equal to the lower bound. Or, alternatively, I use a list of initial centers of length equal to the size of the lower bound. – Chris Aug 19 '17 at 22:09
  • 1
    Yes, you could predefine the kmin centers of the first run. But the assumption of x-means is that the true k >> kmin. So this seems to be a rather unusual scenario. – Has QUIT--Anony-Mousse Aug 19 '17 at 22:13
  • Yes. I have an expected range--and weka provides for this option. But what you are saying is that pre seeding the xmeans with a lower bound of centers could inhibit its final performance? If that is the case, then seeding the xmeans with he cluster settings of a kmeans with k = 2 would produce the best results, right? And, if I used the right kmean's lib, I would not have to implement the center init process: is that what you are saying? – Chris Aug 19 '17 at 22:17
  • 1
    I don't think it can "inhibit" it's performance, but only that you are unlikely to *benefit* from predefining the initial centers for kmin (usually 2). You could try ELKI. Add your own initializer that for kmin give the predefined vectors, otherwise the usual xmeans approach. I doubt it will make a noticeable difference. – Has QUIT--Anony-Mousse Aug 19 '17 at 22:25