0

I have a sentence and a corresponding averaged vector for that sentence. The dataset looks like this:

['there are two injuries one is previous']  -0.003632369
['I have motion with mucus from morning']   -0.000631669
['she will be fine with the meds?']     0.010474829
['can you please suggest some good diet']   0.008024994

I have around 100K rows. I want to cluster similar sentences together. Any idea on how this can be done. I tried different clustering algos from sklearn but getting similar error like "ValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 2"

Adding code sample:

dataset = pd.read_csv(r'Documents/clusters.csv')

X = dataset.iloc[:, 1]
X = X.values.reshape(1, -1)

from sklearn.cluster import OPTICS
db = OPTICS(eps=3, min_samples=0).fit(X)
Erich
  • 87
  • 6
  • 1
    can you share your code implementation? There should be a bug in your code – Cenk Bircanoglu Nov 12 '20 at 08:33
  • Added the code sample. can you please check – Erich Nov 12 '20 at 08:58
  • 1
    By running these two lines `X = dataset.iloc[:, 1]; X = X.values.reshape(1, -1)` at the end you will have only one element with multiple fields. And Clustering algorithm can't work on one item. Can you check the shape of the X before executing `fit` part. – Cenk Bircanoglu Nov 12 '20 at 09:00
  • maybe [this](https://stackoverflow.com/questions/49395939/smote-initialisation-expects-n-neighbors-n-samples-but-n-samples-n-neighbo) can help. – satinder singh Nov 12 '20 at 09:14
  • @CenkBircanoglu I printed the shape before fit part. It reads : (1, 72610).. 72610 actually the number of rows of sentences. Any idea from this part? – Erich Nov 12 '20 at 09:31
  • 1
    Can you use this one `X = X.values.reshape(-1, 1)` instead of `X = X.values.reshape(1, -1)` and try again – Cenk Bircanoglu Nov 12 '20 at 10:26
  • I tweaked the k means application and it is working now without changing the reshape part. kmeans.labels_ is giving me the groups. Thanks a lot for your help :) – Erich Nov 12 '20 at 11:40

0 Answers0