Im new to clustering and Im learning abut text clustering. I found a way to make clusters, and now Im trying to find a way to plot them. This is the error that I get when I want to plot cluster:
ValueError: setting an array element with a sequence.
This is my code:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
x = ['this is very good show' , 'i had a great time on my school trip', 'such a boring movie', 'Springbreak was amazing'
'i love this product' , 'this is an amazing item', 'this food is delicious', 'I had a great time last night',
'this is my favourite restaurant' , 'i love this food, its so good', 'skiing is the best sport', 'what is this',
'I love basketball, its very dynamic' , 'its a shame that you missed the trip', 'game last night was amazing',
'such a nice song' , 'this is the best movie ever', 'hawaii is the best place for trip','how that happened',
'I cant believe that you did that', 'Why are you doing that, I do not gete it', 'this is tasty']
cv = CountVectorizer(analyzer = 'word', max_features = 5000, lowercase=True, preprocessor=None, tokenizer=None, stop_words = 'english')
x = cv.fit_transform(x)
my_list = []
for i in range(1,8):
kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 0)
kmeans.fit(x)
my_list.append(kmeans.inertia_)
plt.plot(range(1,8),my_list)
plt.show()
kmeans = KMeans(n_clusters = 5, init = 'k-means++', random_state = 0)
y_kmeans = kmeans.fit_predict(x)
plt.scatter(x[y_kmeans == 0,0], x[y_kmeans==0,1], s = 15, c= 'red', label = 'Cluster_1')
plt.scatter(x[y_kmeans == 1,0], x[y_kmeans==1,1], s = 15, c= 'blue', label = 'Cluster_2')
plt.scatter(x[y_kmeans == 2,0], x[y_kmeans==2,1], s = 15, c= 'green', label = 'Cluster_3')
plt.scatter(x[y_kmeans == 3,0], x[y_kmeans==3,1], s = 15, c= 'cyan', label = 'Cluster_4')
plt.scatter(x[y_kmeans == 4,0], x[y_kmeans==4,1], s = 15, c= 'magenta', label = 'Cluster_5')
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], s = 100, c = 'black', label = 'Centroids')
plt.show()
What am I doing wrong, I want to see which sentences are being grouped in each cluster, is it even possible to plot like this? How can I test the significance of the clusters found?