Replicate rows by different N

Question

I’ve the following data

mydata <- data.frame(id=c(1,2,3,4,5), n=c(2.63, 1.5, 0.5, 3.5, 4))

1) I need to repeat number of rows for each id by n. For example, n=2.63 for id=1, then I need to replicated id=1 row three times. If n=0.5, then I need to replicate it only one time... so n needs to be round up.

2) Create a new variable called t, where the sum of t for each id must equal to n.

3) Create another new variable called accumulated.t

Here how the output looks like:

id  n   t   accumulated.t
1   2.63    1   1
1   2.63    1   2
1   2.63    0.63    2.63
2   1.5 1   1
2   1.5 0.5 1.5
3   0.5 0.5 0.5
4   3.5 1   1
4   3.5 1   2
4   3.5 1   3
4   3.5 0.5 3.5
5   4   1   1
5   4   1   2
5   4   1   3
5   4   1   4

It sounds like you're just asking someone to code this for you which isn't how Stack Overflow works. Which part exactly are you having trouble with? Did you write any code at all? Where exactly did it fail? — MrFlick, Aug 07 '15 at 14:57
[This](http://stackoverflow.com/questions/12688717/round-up-from-5-in-r) might help, but I'm voting to close until you show some effort. — Rich Scriven, Aug 07 '15 at 15:06

akrun · Accepted Answer · 2015-08-07T15:12:30.407

Get the ceiling of 'n' column and use that to expand the rows of 'mydata' (rep(1:nrow(mydata), ceiling(mydata$n)))

Using data.table, we convert the 'data.frame' to 'data.table' (setDT(mydata1)), grouped by 'id' column, we replicate (rep) 1 with times specified as the trunc of the first value of 'n' (rep(1, trunc(n[1]))). Take the difference between the unique value of 'n' per group and the sum of 'tmp' (n[1]-sum(tmp)). If the difference is greater than 0, we concatenate 'tmp' and 'tmp2' (c(tmp, tmp2)) or if it is '0', we take only 'tmp'. This can be placed in a list to create the two columns 't' and the cumulative sum of 'tmp3 (cumsum(tmp3)).

 library(data.table)
 mydata1 <- mydata[rep(1:nrow(mydata),ceiling(mydata$n)),]
 setDT(mydata1)[, c('t', 'taccum') := {
         tmp <- rep(1, trunc(n[1]))
         tmp2 <- n[1]-sum(tmp)
         tmp3= if(tmp2==0) tmp else c(tmp, tmp2)
         list(tmp3, cumsum(tmp3)) },
                                  by = id]
 mydata1
#  id    n    t taccum
# 1:  1 2.63 1.00   1.00
# 2:  1 2.63 1.00   2.00
# 3:  1 2.63 0.63   2.63
# 4:  2 1.50 1.00   1.00
# 5:  2 1.50 0.50   1.50
# 6:  3 0.50 0.50   0.50
# 7:  4 3.50 1.00   1.00
# 8:  4 3.50 1.00   2.00
# 9:  4 3.50 1.00   3.00
#10:  4 3.50 0.50   3.50
#11:  5 4.00 1.00   1.00
#12:  5 4.00 1.00   2.00
#13:  5 4.00 1.00   3.00
#14:  5 4.00 1.00   4.00

@user9292 No problem. In the future, it is better to show what you tried to make it more interesting for others. — akrun, Aug 07 '15 at 15:13

score 0 · Answer 2 · answered Aug 07 '15 at 16:10

An alternative that utilizes base R.

mydata <- data.frame(id=c(1,2,3,4,5), n=c(2.63, 1.5, 0.5, 3.5, 4))
mynewdata <- data.frame(id = rep(x = mydata$id,times = ceiling(x = mydata$n)),
                        n = mydata$n[match(x = rep(x = mydata$id,ceiling(mydata$n)),table = mydata$id)],
                        t = rep(x = mydata$n / ceiling(mydata$n),times = ceiling(mydata$n)))
mynewdata$t.accum <- unlist(x = by(data = mynewdata$t,INDICES = mynewdata$id,FUN = cumsum))

We start by creating a data.frame with three columns, id, n, and t. id is calculated using rep and ceiling to repeat the ID variable the number of appropriate times. n is obtained by using match to look up the right value in mydata$n. t is obtained by obtaining the ratio of n and ceiling of n, and then repeating it the appropriate amount of times (in this case, ceiling of n again.

Then, we use cumsum to get the cumulative sum, called using by to allow by-group processing for each group of IDs. You could probably use tapply() here as well.

Replicate rows by different N

2 Answers2