How can we cluster average vectors in Python

Question

I have a sentence and a corresponding averaged vector for that sentence. The dataset looks like this:

['there are two injuries one is previous']  -0.003632369
['I have motion with mucus from morning']   -0.000631669
['she will be fine with the meds?']     0.010474829
['can you please suggest some good diet']   0.008024994

I have around 100K rows. I want to cluster similar sentences together. Any idea on how this can be done. I tried different clustering algos from sklearn but getting similar error like "ValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 2"

Adding code sample:

dataset = pd.read_csv(r'Documents/clusters.csv')

X = dataset.iloc[:, 1]
X = X.values.reshape(1, -1)

from sklearn.cluster import OPTICS
db = OPTICS(eps=3, min_samples=0).fit(X)

can you share your code implementation? There should be a bug in your code — Cenk Bircanoglu, Nov 12 '20 at 08:33
By running these two lines `X = dataset.iloc[:, 1]; X = X.values.reshape(1, -1)` at the end you will have only one element with multiple fields. And Clustering algorithm can't work on one item. Can you check the shape of the X before executing `fit` part. — Cenk Bircanoglu, Nov 12 '20 at 09:00
maybe [this](https://stackoverflow.com/questions/49395939/smote-initialisation-expects-n-neighbors-n-samples-but-n-samples-n-neighbo) can help. — satinder singh, Nov 12 '20 at 09:14
@CenkBircanoglu I printed the shape before fit part. It reads : (1, 72610).. 72610 actually the number of rows of sentences. Any idea from this part? — Erich, Nov 12 '20 at 09:31
Can you use this one `X = X.values.reshape(-1, 1)` instead of `X = X.values.reshape(1, -1)` and try again — Cenk Bircanoglu, Nov 12 '20 at 10:26
I tweaked the k means application and it is working now without changing the reshape part. kmeans.labels_ is giving me the groups. Thanks a lot for your help :) — Erich, Nov 12 '20 at 11:40

How can we cluster average vectors in Python

0 Answers0