1

I'm trying to do resampling of the elements of a data frame. I'm open to use other data structures if recommended, but my understanding is that a DF would be better for combining strings, numbers, etc.

Let's say my input is this data frame:

16  x  y  z  2
11  a  b  c  1
.........

And I'd like to build as output another data structure (I take, another df) like this:

16  x   y   z
16  x   y   z
11  a   b   c  
.........

I guess my main issue is the way to append the content, which is on columns df[,1:4].

Thanks in advance, p.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
user3310782
  • 811
  • 2
  • 10
  • 18

3 Answers3

3

It's unclear from your description, but your desired output implies that you want to duplicate columns 1:4 according to column 5, this should do the job

df[rep(seq_len(nrow(df)), df[, 5]), -5]
#     V1 V2 V3 V4
# 1   16  x  y  z
# 1.1 16  x  y  z
# 2   11  a  b  c
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • Yes, that clever combination did the trick. It could also be done in a less elegant way with a for loop. thank you – user3310782 Oct 31 '14 at 15:02
2

Assuming you're starting with something like:

mydf
#   V1 V2 V3 V4 V5
# 1 16  x  y  z  2
# 2 11  a  b  c  1

Then, you can just use expandRows from my "splitstackshape" package, like this:

library(splitstackshape)
expandRows(mydf, count = "V5")
#     V1 V2 V3 V4
# 1   16  x  y  z
# 1.1 16  x  y  z
# 2   11  a  b  c

By default, the function assumes that you are expanding your dataset based on an existing column, but you can just as easily add a numeric vector as the count argument, and set count.is.col = FALSE.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
0

If you want to sample with replacement n rows from df data frame:

df[sample(nrow(df), n, replace=TRUE), ]

Tim
  • 7,075
  • 6
  • 29
  • 58