1

I am trying to use ELKI's SLINK implementation of hierarchical clustering in my program.

I have a set of objects (of my own type) that need to be clustered. For that, I convert them to feature vectors before clustering.

This is how I currently got it to run and produce some result (code is in Scala):

val clusterer = new SLINK(CosineDistanceFunction.STATIC, 3)
val connection = new ArrayAdapterDatabaseConnection(featureVectors)
val database = new StaticArrayDatabase(connection, null)
database.initialize()

val result = clusterer.run(database).asInstanceOf[Clustering[_ <: Model]]

Now, the result is a Clustering that contains elements of type Model. I can output them, but I don't know how to make sense of this result, especially since SLINK returns models of type DendrogramModel which does not seem to be parametrizable.

Specifically, how can I link the results back to my original elements (the ones from which I created the variable featureVectors earlier)?

I assume I need to create some kind of custom model or somehow maintain some link to the original elements through initialization and execution of the algorithm to retrieve from the result. I cannot find where to get started on this though.

I am aware that embedding ELKI into own programs is discouraged. However, it seems that calling ELKI in some other way would not be any different: I need to cluster and map the results back to my objects during runtime of my program.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
notan3xit
  • 2,386
  • 2
  • 21
  • 26

1 Answers1

2

The DendrogramModel does not include the objects in the cluster. Models are additional meta data on the clusters.

Use the getIDs() method to access the members of a Cluster instance.

Erich Schubert
  • 8,575
  • 2
  • 26
  • 42
  • Okay, so I get the members of a cluster using `getIDs()`. Following [your example on running algorithms](http://stackoverflow.com/questions/15326505/running-clustering-algorithms-in-elki/15334879#15334879) I acquire a `Relation` by calling `db.getRelation(TypeUtil.DOUBLE_VECTOR_FIELD)` and can then `get()` my vectors from it. Is that workflow correct? I am still missing a possibility to get back to my original objects. Can I somehow use my own `DoubleVector` subclass (that contains a reference to my data) when initializing and running the algorithm? – notan3xit Jul 17 '13 at 12:45
  • Yes, get the relation of the desired type, then get the instances by their DBID. Avoid making `DBID` objects, you can call `get` on the iterator which is much cheaper. Many algorithms will run with any class that you have a distance function for. Most distance functions are defined for arbitrary `NumberVector`s. So either implement your own distance function (for your data type), or implement the `NumberVector` interface on your data type, and you should be good to go. – Erich Schubert Jul 17 '13 at 16:04
  • I think I understand how to plug in my `DistanceFunction`. But the rest is exactly the problem: How _can_ I have an algorithm run on custom classes? Do I have to implement my own `DatabaseConnection` for that? All existing implementations seem not that generic. – notan3xit Jul 17 '13 at 16:22
  • Implementing a `DatabaseConnection` is probably the easiest way to get custom data into the ELKI database layer. If the data is file resident, it may however suffice to implement a `Parser` instead, or a `NumberVector.Factory` (if the existing parser can be reused, but should produce different objects). – Erich Schubert Jul 23 '13 at 07:50