How to cluster an instance with Weka's DBSCAN?

Question

I've been trying to use the DBSCAN clusterer from Weka to cluster instances. From what I understand I should be using the clusterInstance() method for this, but to my surprise, when taking a look at the code of that method, it looks like the implementation ignores the parameter:

/**
 * Classifies a given instance.
 *
 * @param instance The instance to be assigned to a cluster
 * @return int The number of the assigned cluster as an integer
 * @throws java.lang.Exception If instance could not be clustered
 * successfully
 */
public int clusterInstance(Instance instance) throws Exception {
    if (processed_InstanceID >= database.size()) processed_InstanceID = 0;
    int cnum = (database.getDataObject(Integer.toString(processed_InstanceID++))).getClusterLabel();
    if (cnum == DataObject.NOISE)
        throw new Exception();
    else
        return cnum;
}

This doesn't seem right. How is that supposed to work? Is there a different method I should be using for clustering? Do I have to run this method sequentially on all instances, in some specific order, if I want to get any useful information out of it?

On a side note. DBSCAN is spelled all uppercase, not DBScan. Just another bug in Weka. Clustering in Weka isn't very usable, unfortunately. After all, Weka is more of a machine learning toolkit. — Has QUIT--Anony-Mousse, Feb 06 '12 at 12:45
**Please do not use Wekas versions of DBSCAN and OPTICS anymore**. They are unsupported student contributions, feature-incomplete and really slow. For clustering, please use ELKI instead. — Erich Schubert, Jan 01 '13 at 11:40
Update: Weka DBSCAN version 1.0.3 has become significantly faster (not as fast as ELKI though). OPTICS too, but it won't yet extract clusters from the plot automatically (see ELKI OPTICSXi for that). — Erich Schubert, Jan 13 '13 at 10:45

Mark McLaren · Answer 1 · 2011-09-21T23:22:46.763

This has been reported as a bug - [Wekalist] DBScan - Issue/Bug with "clusterInstance()"-Function.

I'm doing some clustering with the DBScan library. Unfortunately it seems that there is a bug in the function "clusterInstance()". The function doesn't return the number of the assigned cluster but only returns the cluster-number of the first database element (or the second on the second call, the third on the third call, and so on.) and NOT the assigned instance.

It simply cannot work because the assigned variable is never used in the function.

The response reads:

DBScan and Optics are contributions to Weka. It's probably best if you contact the authors to see if they can suggest a bug fix. The code and package info (Weka 3.7) has contact information:

http://weka.sourceforge.net/packageMetaData/optics_dbScan/index.html

I'm afraid I am unfamiliar with the DBScan algorithm and the code is quite old now (2004), you might be lucky and find that you are still able to contact the authors at LMU Munich.

I did find numerous copies of it via Google Code Search and GitHub but I could not find an example where it had been fixed. While searching I did notice several other implementations of DBScan that you could examine to work out how this one could be fixed (e.g. ELKI's DBSCAN)

As I have said I am unfamiliar with DBScan but looking at the JavaDocs gave me the impression that actual clustering is invoked by calling buildClusterer(Instances instances). Examining the source code there seems to be much more going on inside the buildClusterer method than the clusterInstance method. OPTICS.java contains a clusterInstance method too and that one just throws an exception. If your are lucky maybe you can get by without a functioning clusterInstance method.

I found an example of Weka's DBScan being used here: DBSCANClustering.java

Thank you for confirming it's a bug and pointing me to the relevant example. Once I find a way to actually work-around it I'll post it here as an answer. — Oak, Sep 22 '11 at 12:03
DBSCAN and OPTICS are pretty much unsupported in Weka. In general, the clustering capabilities of Weka are not worth investigation, unfortunately. — Has QUIT--Anony-Mousse, Nov 25 '11 at 18:14
They are also **horrendously slow**. Definitely don't use Weka here. — Has QUIT--Anony-Mousse, Jun 30 '12 at 11:38
**Please don't use DBSCAN and OPTICS from Weka anymore!** They are old and unsupported student contributions, and their performance is *really* bad. Use ELKI for clustering. — Erich Schubert, Jan 01 '13 at 11:38
Update: Weka DBSCAN version 1.0.3 has become significantly faster (not as fast as ELKI though). OPTICS too, but it won't yet extract clusters from the plot automatically (see ELKI OPTICSXi for that). — Erich Schubert, Jan 13 '13 at 10:46

Dario Seidl · Answer 2 · 2011-09-22T00:06:23.747

0

The example posted by Mark shows well how to use the DBScan class.

The method that does the actual clustering is DBScan.buildClusterer(Instances instances).

The DBScan.clusterInstance(Instance instance) is supposed to return the number of the assigned cluster for a given instance (after you ran the buildClusterer method). But it's true the parameter is actually ignored, so I guess it won't do what it's supposed to do.

edited Sep 22 '11 at 00:06

answered Sep 22 '11 at 00:00

Dario Seidl

4,140
1
39
55

`buildClusterer` should be invoked in any case, for other clusterer types as well. – Oak Sep 22 '11 at 12:03
@Oak Yes, I just wanted to point out that `buildClusterer` seems to do the actual clustering here. – Dario Seidl Sep 22 '11 at 12:22

score 0 · Accepted Answer · answered Nov 16 '11 at 18:39

As Mark answered, this is obviously a bug. As long as you query about instances in the exact same order in which they were inserted into the clusterer it's okay; but it won't work in any other case.

A co-worker solved this by writing her own version of the DBScan class: essentially identical (copy-pasted), except that she maintains a mapping between instances and cluster labels. This mapping can be produced by iterating over the contents of the database instance. The appropriate cluster for an instance can then be immediately retrieved from that mapping.

Editing this method is also a good opportunity to change the throw new Exception into something more sensible in this context, such as return -1.

How to cluster an instance with Weka's DBSCAN?

3 Answers3

Linked