0

I have a data frame "v" with id and value columns, such as:

set.seed(123)
v <- data.frame(id=sample(1:5),value=sample(1:5))
v
  id value
1  2   1
2  4   3
3  5   4
4  3   2
5  1   5

In the loop, I want to find the index of v which v's id matches tmp and then find the subset of v based on this index. tmp is a sample with "replacement" of v$id

Here is my attempt:

df <- vector(mode='list',length = iter)
iter = 1

for (i in 1:iter) 
{ 
  tmp <- sample(v$id, replace=T)

  index.position <- NULL
  for (j in 1:length(tmp)) {index.position <- c(index.position, which(v$id %in% tmp[j]) )}

  df[[i]] <- v[index.position,]
}
tmp
[1] 1 5 3 5 2
df
[[1]]
    id value
5    1     5
3    5     4
4    3     2
3.1  5     4
1    2     1

This works as expected. However, the execution is very slow when both "v" and "iter" are large because growing the index.position array is not memory efficient.

I have also tried to create an empty matrix or list as a placeholder and then assign index.position to it as I loop, but did not really speed up the process. (reference: Growing a data.frame in a memory-efficient manner)

Edit: id "isn't" unique in v

Community
  • 1
  • 1
ohmyan
  • 337
  • 2
  • 10
  • 2
    If `id` is unique in `v`, then sampling `id` values and matching them to get the row indices is a *very inefficient* way to sample row indices. I think you could just do `df <- replicate(n = iter, expr = v[sample(nrow(v), replace = T), ], simplify = FALSE)` – Gregor Thomas Mar 13 '17 at 23:57

1 Answers1

0

Try to avoid for...for... loop. It is extremely inefficient. It is equal to:

for (i in 1:iter) 
{ 
  df[[i]] <- v[sample(nrow(v),replace = T),]
}

a more verbose version of Gregor's solution...

Feng
  • 603
  • 3
  • 9