1

I had one scenario where i had to cluster Operating System data. Actual Data consists of 151 users using Windows,27 users using MAC,5 users using Linux.

Once after clustering with Carrot2 API using Lingo3gClusteringAlgorithm. Getting cluster results as MAC OS users 27 ,Linux users 5 and finally all Windows users are in Other Topics Cluster. But it would be good if i get Windows users as a separate Cluster. So in order to get Windows as a separate cluster what clustering attributes do i need to configure. Currently using only "combined-cluster-score-balance" with value:1.0. Any help is appreciated

Pavan
  • 11
  • 2
  • Carrot2 performs unsupervised text-based clustering, so the results will never be perfect. If you could make your data set available somewhere, we could inspect it to see if any further tuning is possible. – Stanislaw Osinski Dec 05 '16 at 08:24
  • Hey @StanislawOsinski sorry for the late reply. Thanks for your help .Here is the link where I posted the data https://pastebin.com/VgNUdjdM. Using the following configs ("combined-cluster-score-balance", "1.0"); ("active-language", "ENGLISH"); ("max-cluster-size",1.0); With clustering algorithm as Lingo3GClusteringAlgorithm.class – Pavan May 26 '17 at 20:13

1 Answers1

0

Both Carrot2 and Lingo3G are natural text clustering engines. You'll need at least a dozen of documents consisting of at least a paragraph of text to get sensible results.

Looking at your data, the text fields contain one word, which far too little for our algorithms to succeed. For your specific data you many need some of the generic clustering algorithms suitable for numeric and nominal data. Mahout and WEKA might be a good start.

Stanislaw Osinski
  • 1,231
  • 1
  • 7
  • 9
  • Okay.So, is there any possibility to improve clustering by adding any configs on top of existing based on the data which i provided. – Pavan May 30 '17 at 15:11
  • Unfortunately not, Carrot2 and Lingo3G are not applicable to your data. They require at least a paragraph of natural text to work. Individual words are not enough. – Stanislaw Osinski May 31 '17 at 08:47