-2

I have data from sensors and I want to run a cluster algorithms on this data. The data contains no information about cluster labels, but I can add some labels manually.

How can I use manually added labels to help unsupervised learning?

One small example - use measurements with labels as initial centers for k-means. What density-based algorithm can I use for this data?

Daniel
  • 2,355
  • 9
  • 23
  • 30
cuga
  • 123
  • 1
  • 8
  • What's the size of your data? How many labels are you prepared to manually label? – user2974951 Dec 07 '18 at 13:21
  • The size can be 100k-1m rows. About 7 labels and 10 examples for each – cuga Dec 07 '18 at 13:52
  • https://stackoverflow.com/questions/21258367/what-are-some-packages-that-implement-semi-supervised-constrained-clustering – hellpanderr Dec 07 '18 at 16:18
  • Semi-supervised learning is a good option. The idea being that you manually label some data points, and then use some classification algorithm, such as knn, to get some more labels, for ex. in the case of knn you could label cases which are close to your manual labels. Doing this should give you enough labels that you can perform cluster analysis and label all the remaining cases. – user2974951 Dec 07 '18 at 18:39

1 Answers1

0

You can choose which samples will be the initial centers for k-means using the init argument (read the doc here).

If an ndarray is passed to init, it should be of shape (n_clusters, n_features) and gives the initial centers. In this case a single initialization will be performed using the centroids specified in the array as explained here.

This shape required means that init must have exactly n_clusters rows, and the number of elements in each row should match the dimensionality of actual_data_points as discussed here.

CarlosHPF
  • 46
  • 5