R: "Disaccumulate" count in dataset variable to individual rows

Question

I am beginner in R and took several hours trying to solve an issue.

I would "disaccumulate" the value of a variable in a dataset, obtaining individual observations in rows. I think this is best explained with an example.

I would go from:

Variable 1   Variable 2   Count
 GROUP1         A           3
 GROUP1         B           2
 GROUP2         A           2
 GROUP2         B           4

to:

Variable 1   Variable 2   Count
 GROUP1         A           1
 GROUP1         A           1
 GROUP1         A           1
 GROUP1         B           1
 GROUP1         B           1
 GROUP2         A           1
 GROUP2         A           1
 GROUP2         B           1
 GROUP2         B           1
 GROUP2         B           1
 GROUP2         B           1

I think maybe I can approach the solution using apply but I have tried using melt, apply with strsplit, xtabs ... but I did not get a result.

Thank you very much in advance, greetings.

user20650 · Answer 1 · 2015-02-20T23:36:25.117

3

Using @bgoldst dataset

You can repeat each row the number of times given by the count variable

EDIT As Marat suggests in the comments you can use (which feels a bit safer)

y <- x[rep(1:nrow(x), x$count), ]

instead of

y <- x[rep(row.names(x), x$count), ]

and then set count to one

y$count <- 1

edited Feb 20 '15 at 23:36

answered Feb 20 '15 at 23:08

user20650

24,654
5
56
91

Nice one. As an alternative, you could also use `y <- x[rep(1:nrow(x), x$count), ]` – Marat Talipov Feb 20 '15 at 23:15
Yes @MaratTalipov; that is definitely better - cheers – user20650 Feb 20 '15 at 23:19
1

ha, ty again - btw in future, please feel free to edit my answers – user20650 Feb 20 '15 at 23:24

score 3 · Answer 2 · answered Feb 21 '15 at 05:56

An option using splitstackshape

library(splitstackshape)
expandRows(setDT(df1), 'Count', drop=FALSE)[,Count:=1][]
#    Variable 1 Variable 2 Count
#1:     GROUP1          A     1
#2:     GROUP1          A     1
#3:     GROUP1          A     1
#4:     GROUP1          B     1
#5:     GROUP1          B     1
#6:     GROUP2          A     1
#7:     GROUP2          A     1
#8:     GROUP2          B     1
#9:     GROUP2          B     1
#10:    GROUP2          B     1
#11:    GROUP2          B     1

data

df1 <- structure(list(`Variable 1` = c("GROUP1", "GROUP1", "GROUP2", 
"GROUP2"), `Variable 2` = c("A", "B", "A", "B"), Count = c(3L, 
2L, 2L, 4L)), .Names = c("Variable 1", "Variable 2", "Count"),
class = "data.frame", row.names = c(NA, -4L))

score 2 · Accepted Answer · answered Feb 20 '15 at 21:51

Here's a solution:

r> x <- data.frame(v1=c('GROUP1','GROUP1','GROUP2','GROUP2'), v2=c('A','B','A','B'), count=c(3,2,2,4) );
r> x;
      v1 v2 count
1 GROUP1  A     3
2 GROUP1  B     2
3 GROUP2  A     2
4 GROUP2  B     4
r> y <- cbind(do.call(rbind, lapply(1:nrow(x), function(r) do.call(rbind, replicate(x[r,'count'], x[r,names(x)[names(x)!='count']], simplify=F ) ) ) ), count=1 );
r> rownames(y) <- 1:nrow(y);
r> y;
       v1 v2 count
1  GROUP1  A     1
2  GROUP1  A     1
3  GROUP1  A     1
4  GROUP1  B     1
5  GROUP1  B     1
6  GROUP2  A     1
7  GROUP2  A     1
8  GROUP2  B     1
9  GROUP2  B     1
10 GROUP2  B     1
11 GROUP2  B     1

Yes! it's just what I needed . Many thanks @bgoldst !! And thanks for your speed . A greeting! — Charles, Feb 20 '15 at 22:09

R: "Disaccumulate" count in dataset variable to individual rows

3 Answers3

data

Linked

Related