0

I derive a term-term co-occurrence matrix, K from a Document-Term Matrix in R. I am interested in carrying out a K-means clustering analysis on the keyword-by-keyword matrix, K. The dimension of K is 8962 terms x 8962 terms.

I pass K to the kmeans function as follows:

for(i in 1:25){
    #Run kmeans for each level of i, allowing up to 100 iterations for convergence
    kmeans<- kmeans(x=K, centers=i, iter.max=100)

    #Combine cluster number and cost together, write to df
    cost_df<- rbind(cost_df, cbind(i, kmeans$tot.withinss))

 }

My original Document-Term matrix which was 590 documents x 8962 terms and running the above code on the DTM does not give me the hanging issue. However, I do encounter hanging with the keyword-by-keyword matrix due to its size. Any suggestions as to how to overcome this would be helpful.

newdev14
  • 1,091
  • 4
  • 15
  • 25

2 Answers2

0

k-means requires coordinates. Because it needs to be able to compute means (that is why it's called k-means).

You have a sort of similarity matrix there. Choose other clustering algorithms instead.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • R still hangs when I try hierarchical clustering, I think the size of the matrix is the problem but not sure how to get around this... – newdev14 May 03 '16 at 16:10
0

Your matrices are large but VERY sparse. Try using a sparse matrix.

Ray
  • 2,974
  • 20
  • 26