I'm trying to use sklearn to cluster some tweets as a dictionary I have 25 initial centroids id (tweet id) I wrote it in my own functions, BUT I don't know how to implement it with sklearn
# {845512:'tweet id 845512', 543115:'tweet id 543115', ...}
# initial_centroids = [845512, 546318, 84632, ...] - 25 centroids
NOTE: tweets_vec <= I need to make it by jaccard_distance
tweets_vec = Is the jaccard distance matrix (it may be wrong, i dont know)
kmeans = KMeans(n_clusters=25, init=initial_seeds).fit(tweets_vec)
I made a 2D matrix in which there are jaccard distances. I don't know how to fix init in kmeans method. it errors that's not ndarray
what exactly should I pass to it?