0

Hi i have gotten the mean of the vectors and used DBSCAN to cluster them. However, i am unsure of how i should plot the results since my data does not have an [x,y,z...] format.

sample dataset:

mean_vec = [[2.2771908044815063],
 [3.0691280364990234],
 [2.7700443267822266],
 [2.6123080253601074],
 [2.6043469309806824],
 [2.6386525630950928],
 [2.7034034729003906],
 [2.3540258407592773]]

I have used this code below(from scikit-learn) to achieve my clusters:

X = StandardScaler().fit_transform(mean_vec)
db = DBSCAN(eps = 0.15, min_samples = 5).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)

print('Estimated number of clusters: %d' % n_clusters_)

is it possible to plot out my clusters ? the plot from scikit-learn is not working for me. The scikit-learn link can be found here

Cua
  • 129
  • 9
  • So you want to cluster 1D vector as I understand? – avchauzov Oct 10 '18 at 05:56
  • yes maybe with something like a horizontal scatter chart ? – Cua Oct 10 '18 at 07:16
  • I think that DBSCAN may work with 1D data with some modifications in the algorithm: https://arxiv.org/pdf/1602.03730.pdf You can take a look at one clustering approach here: https://stackoverflow.com/questions/35094454/how-would-one-use-kernel-density-estimation-as-a-1d-clustering-method-in-scikit Probably, GMM may work too. – avchauzov Oct 10 '18 at 07:54

1 Answers1

0

On one dimensional data. Use kernel density estimation rather than DBSCAN. It is much better supported by theory and much better understood. One can see DBSCAN as a fast approximation to KDE for the multivariate case.

Any way, plotting 1 dimensional data is not that hard. For example, you can plot a histogram.

Also the clusters will necessarily correspond to intervals, so you can also plot lines for (min,max) of each cluster.

You can even abuse 2d scatter plots. Simply use the label as y value.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194