2

I am generating a sparse vector length >50,000. I am producing it in a for loop. I wonder if there is an efficient way of storing the zeros?

Basically the code looks like

score = c()
for (i in 1:length(someList)) {
score[i] = getScore(input[i], other_inputs)
if (score[i] == numeric(0))
score[i] = 0    ###I would want to do something about the zeros
}
user2498497
  • 693
  • 2
  • 14
  • 22
  • Can you give an example of the getScore function and trial data. Ideally you do not want to use for loops in R. – Geoffrey Absalom Jun 18 '13 at 19:41
  • So my data set looks like the following: There are about 500,000 obs, and 2 variables. So 500,000 rows. Each row looks like: document_id, score, word, where word is a string. There are 4000 unique documents, ie. unique row names. There are 53000 unique words. So what I want is to make a mapping of the dataset. So that the rows are the unique 4000 documents and the columns are the words in the corpus. I know this matrix will be very sparse so I would need to store it in a "sparse" manner. I am not sure how to do it exactly. But the getScore function enables me to extract entries. – user2498497 Jun 18 '13 at 20:06
  • plenty of results here on sparse matrices: http://stackoverflow.com/q/1167448/59470 and http://stackoverflow.com/q/1274171/59470 to begin with. – topchef Jun 19 '13 at 14:16

1 Answers1

1

This code will not work. You should preallocate score vector size before looping. Preallocating also will create a vector with zeros. So, no need to assign zeros values, you can only assign numeric results from getScore function.

N <- length(someList)  ## create a vector with zeros
score = vector('numeric',N)
for (i in 1:N) {
  ss <- getScore(input[i], other_inputs)
  if (length(ss)!=0)
    score[i] <- ss  
}
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • 1
    I think the main issue is to store those zeros or "numeric(0)"s. I definitely don't want R to store 50,000 some zeros. Rather, I want some type of sparse storage (so it doesn't take so much space) within each loop. I have been looking at packages like "SparseM" or "Matrix" but I don't see a way to apply it. – user2498497 Jun 18 '13 at 20:00