1

I am beginner in R and took several hours trying to solve an issue.

I would "disaccumulate" the value of a variable in a dataset, obtaining individual observations in rows. I think this is best explained with an example.

I would go from:

Variable 1   Variable 2   Count
 GROUP1         A           3
 GROUP1         B           2
 GROUP2         A           2
 GROUP2         B           4

to:

Variable 1   Variable 2   Count
 GROUP1         A           1
 GROUP1         A           1
 GROUP1         A           1
 GROUP1         B           1
 GROUP1         B           1
 GROUP2         A           1
 GROUP2         A           1
 GROUP2         B           1
 GROUP2         B           1
 GROUP2         B           1
 GROUP2         B           1

I think maybe I can approach the solution using apply but I have tried using melt, apply with strsplit, xtabs ... but I did not get a result.

Thank you very much in advance, greetings.

bgoldst
  • 34,190
  • 6
  • 38
  • 64
Charles
  • 33
  • 3

3 Answers3

3

Using @bgoldst dataset

You can repeat each row the number of times given by the count variable

EDIT As Marat suggests in the comments you can use (which feels a bit safer)

y <- x[rep(1:nrow(x), x$count), ]

instead of

y <- x[rep(row.names(x), x$count), ]

and then set count to one

y$count <- 1
user20650
  • 24,654
  • 5
  • 56
  • 91
3

An option using splitstackshape

library(splitstackshape)
expandRows(setDT(df1), 'Count', drop=FALSE)[,Count:=1][]
#    Variable 1 Variable 2 Count
#1:     GROUP1          A     1
#2:     GROUP1          A     1
#3:     GROUP1          A     1
#4:     GROUP1          B     1
#5:     GROUP1          B     1
#6:     GROUP2          A     1
#7:     GROUP2          A     1
#8:     GROUP2          B     1
#9:     GROUP2          B     1
#10:    GROUP2          B     1
#11:    GROUP2          B     1

data

df1 <- structure(list(`Variable 1` = c("GROUP1", "GROUP1", "GROUP2", 
"GROUP2"), `Variable 2` = c("A", "B", "A", "B"), Count = c(3L, 
2L, 2L, 4L)), .Names = c("Variable 1", "Variable 2", "Count"),
class = "data.frame", row.names = c(NA, -4L))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Here's a solution:

r> x <- data.frame(v1=c('GROUP1','GROUP1','GROUP2','GROUP2'), v2=c('A','B','A','B'), count=c(3,2,2,4) );
r> x;
      v1 v2 count
1 GROUP1  A     3
2 GROUP1  B     2
3 GROUP2  A     2
4 GROUP2  B     4
r> y <- cbind(do.call(rbind, lapply(1:nrow(x), function(r) do.call(rbind, replicate(x[r,'count'], x[r,names(x)[names(x)!='count']], simplify=F ) ) ) ), count=1 );
r> rownames(y) <- 1:nrow(y);
r> y;
       v1 v2 count
1  GROUP1  A     1
2  GROUP1  A     1
3  GROUP1  A     1
4  GROUP1  B     1
5  GROUP1  B     1
6  GROUP2  A     1
7  GROUP2  A     1
8  GROUP2  B     1
9  GROUP2  B     1
10 GROUP2  B     1
11 GROUP2  B     1
bgoldst
  • 34,190
  • 6
  • 38
  • 64