Weka ClusterMembership Filter gives only 1 and 0 probabilities

Question

Recently, I have working with Weka to cluster data into groups using the built-in EM clusterer. However, while the clustering itself works fine, when I save the output file, I notice that the "probabilities" for being in a cluster were all 0's and 1's. This made me suspicious, as it seems unlikely that Weka could distinguish between clusters with 100% confidence. So, then what I did was I generated data that was essentially random and "unclusterable", if you will, and upon reclustering, I found again, the output probabilities were all 1's and 0's.

Even further, to be sure the clusterer wasn't clustering on some feature that I was completely overlooking, I made a seperate utility to generate a TSNE plot of the random data, and sure enough, it looked random and the clusters the EM clusterer made didn't really make sense, as should be the case for random data.

My question then is this: Why is the ClusterMembership feature of the Weka yielding only 1's and 0's for the probability of being in a cluster even for completely random data? Am I missing something very obvious or is there a deeper issue?

Here is the ClusterMembership documentation and here is the closest related question I could find on SO, but it seems pretty far off from what I want. Any suggestions/ideas are welcome on this, as the only reason I can think of why this would be happening is that there is something fundamentally wrong with the way my data is structured (which seems unlikely, because I have used this data in other learning problems with a high degree of success), or Weka's clustering itself is just not that good, which from my previous question seems like a plausible reason, although I hope this is not the case.

Update: I managed to replicate this problem with the following minimalist .arff file:

@relation 'Test'

@attribute x numeric
@attribute y numeric

@data 
{0 1}
{1 1}
{}
{0 1,1 1}

Running this with the ClusterMembership filter (2 clusters), again I get that the probabilities are all 1's or 0's. Note that this clearly does not make sense as there are multiple ways to cluster this data into 2 groups, so giving a probability of 1 for the clusters is not realistic. Also, I should add that I am using Weka 3.8.1.

Well, have you tried other tools then? Such as ELKI, KNIME, RapisMiner, ... — Has QUIT--Anony-Mousse, Aug 01 '18 at 18:17
@Anony-Mousse I have not, but this is mainly due to the fact that it would be a major inconvenience to try get a non-Weka ML system up and running given the code I have already built up. If Weka simply does not work, then of course I will find a way to work around this, but for now it would be significantly easier if we could just use Weka. — Alerra, Aug 01 '18 at 20:09
That is a rather odd arff file you are offering. WEKA did read it, but the value of the data is two copies of two distinct points. These could reasonably be divided into two groups each with standard deviations of zero, thus giving 100% probability of cluster membership. — G5W, Aug 05 '18 at 23:28
@G5W, I realize I made a typo in my question to make it a stupid question, my bad. My original arff file, however, was as my question is now, which should now be 4 distinct points arranged in a 'square' in some sense of the word, where I was still getting the 1's and 0's for my probabilities. Any ideas on this? — Alerra, Aug 06 '18 at 13:29

Weka ClusterMembership Filter gives only 1 and 0 probabilities

0 Answers0