How can I speed up the following (noob) code:
#"mymatrix" is the matrix of word counts (docs X terms)
#"tfidfmatrix" is the transformed matrix
tfidfmatrix = Matrix(mymatrix, nrow=num_of_docs, ncol=num_of_words, sparse=T)
#Apply a transformation on each row of the matrix
for(i in 1:dim(mymatrix)[[1]]){
r = mymatrix[i,]
s = sapply(r, function(x) ifelse(x==0, 0, (1+log(x))*log((1+ndocs)/(1+x)) ) )
tfmat[i,] = s/sqrt(sum(s^2))
}
return (tfidfmatrix)
Problem is that the matrices I am working on are fairly large (~40kX100k), and this code is very slow.
The reason I am not using "apply" (instead of using a for loop and sapply) is that apply is going to give me the transpose of the matrix I want - I want num_of_docs X num_of_words, but apply will give me the transpose. I will then have to spend more time computing the transpose and re-allocating it.
Any thoughts on making this faster?
Thanks much.
Edit : I have found that the suggestions below greatly speed up my code (besides making me feel stupid). Any suggestions on where I can learn to write "optimized" R code from?
Edit 2: OK, so something is not right. Once I do s.vec[!is.finite(s.vec)] <- 0
every element of s.vec is being set to 0. Just to re-iterate my original matrix is a sparse matrix containing integers. This is due to some quirk of the Matrix
package I am using. When I do s.vec[which(s.vec==-Inf)] <- 0
things work as expected. Thoughts?