1

I use DBSCAN clustering for text document as follows, thanks to this post.

db = DBSCAN(eps=0.3, min_samples=2).fit(X)
core_samples_mask1 = np.zeros_like(db1.labels_, dtype=bool)
core_samples_mask1[db1.core_sample_indices_] = True
labels1 = db1.labels_

Now I want to see which document belongs to which cluster, like:

[I have a car and it is blue] belongs to cluster0

or

idx [112] belongs to cluster0

The similar way my question asked in here but I am already tested the some of the answers provided there as:

X[labels == 1,:]

and I got :

array([[0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0]], dtype=int64)

but this does not help me. Please let me know if you have any suggestion or ways to do it.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Bilgin
  • 499
  • 1
  • 10
  • 25

1 Answers1

2

If you have a pandas dataframe df with columns idx and messages, then all you have to do is

df['cluster'] = db.labels_

in order to get a new column cluster with the cluster membership.

Here is a short demo with dummy data:

import numpy as np
import pandas as pd
from sklearn.cluster import DBSCAN

X = np.array([[1, 2], [5, 8], [2, 3],
               [8, 7], [8, 8], [2, 2]])

db = DBSCAN(eps=3, min_samples=2).fit(X)
db.labels_
# array([0, 1, 0, 1, 1, 0], dtype=int64)

# convert our numpy array to pandas:
df = pd.DataFrame({'Column1':X[:,0],'Column2':X[:,1]})
print(df)
# result:
   Column1  Column2
0        1        2
1        5        8
2        2        3
3        8        7
4        8        8
5        2        2

# add new column with the belonging cluster:
df['cluster'] = db.labels_

print(df)
# result:
   Column1  Column2  cluster
0        1        2        0
1        5        8        1
2        2        3        0
3        8        7        1
4        8        8        1
5        2        2        0  
desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • @Bilgin don't know much about DBSCAN itself (have never used it), plus this is an arguably off-topic question for SO (not that I wouldn't answer if I knew it, of course), which is about *coding* issues. ML methodology issues should be addressed to [Cross Validated](https://stats.stackexchange.com/help/on-topic). – desertnaut Jul 02 '19 at 17:57