0

I am receiving an error Error in x$j : $ operator is invalid for atomic vector while running the below R code for k means clustering.

there are a few early steps that can be ignored which are just for other reasons, but I have included them for completeness.

The error s thrown for line findFreqTerms(dtm[cl$cluster==1], 50). Everything else is working fine.

library(tm)
data(crude)


# create mycorpus first using this line
mycorpus<-Corpus(VectorSource(crude))

#create document term matrix using tdidf for further processing
dtm <- DocumentTermMatrix(mycorpus)
dtm_tfxidf <- weightTfIdf(dtm)


## do document clustering
### k-means (this uses euclidean distance)
m <- as.matrix(dtm_tfxidf)
rownames(m) <- 1:nrow(m)

### don't forget to normalize the vectors so Euclidean makes sense
norm_eucl <- function(m) m/apply(m, MARGIN=1, FUN=function(x) sum(x^2)^.5)
m_norm <- norm_eucl(m)


### cluster into 10 clusters
cl <- kmeans(m_norm, 10)
cl

table(cl$cluster)

### show clusters using the first 2 principal components
plot(prcomp(m_norm)$x, col=cl$cl)

findFreqTerms(dtm[cl$cluster==1], 50) #this is not working
inspect(mycorpus[which(cl$cluster==1)])

Any assistance is greatly appreciated.

user1222447
  • 113
  • 1
  • 1
  • 7
  • Please edit your question to include a minimal [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). If we don't have any sample input data, we can't run the code to generate the same error which makes it difficult to help. If the question is just about `kmeans()`, try to remove all the irrelevant `tm` code. – MrFlick Jan 18 '15 at 00:56

1 Answers1

3

Your problem is that the document term matrix dtm should be indexed as a 2D matrix with the row and col. It should be

findFreqTerms(dtm[cl$cluster==1,], 50)

(notice the extra comma, we leave the second parameter empty so all columns are returned)

MrFlick
  • 195,160
  • 17
  • 277
  • 295