0

I have a data set which contains counts for every combination of characteristics. A toy example is provided below. So, for example, there are three 18 year old females who make $65,000.

AGE=c(18,19,18,19)
SEX=c("M","F","F","M")
INCOME=c(70000,60000,65000,75000)
COUNT =c(1,2,3,4)
df<-data.frame(AGE,SEX,INCOME,COUNT)

I would like to repeat every observation n times depending on the count. I have accomplished this using a for-loop but I'm finding this very inefficient in R.

df4<-data.frame(AGE=c(),SEX=c(),INCOME=c(),COUNT=c())
for(i in 1:nrow(df)){
  n <- df[i,4]
  df4<-rbind(df4,df[rep(i, n), ])
}

What is a more efficient way to do this?

Remy M
  • 599
  • 1
  • 4
  • 17
  • I believe what you're looking for is described here... https://stackoverflow.com/a/49039829/9517359 – TJ83 Jun 28 '19 at 19:46

1 Answers1

1
library(dplyr)
library(tidyr)

AGE=c(18,19,18,19)
SEX=c("M","F","F","M")
INCOME=c(70000,60000,65000,75000)
COUNT =c(1,2,3,4)
df<-data.frame(AGE,SEX,INCOME,COUNT)

df %>% 
    uncount(COUNT)
#>     AGE SEX INCOME
#> 1    18   M  70000
#> 2    19   F  60000
#> 2.1  19   F  60000
#> 3    18   F  65000
#> 3.1  18   F  65000
#> 3.2  18   F  65000
#> 4    19   M  75000
#> 4.1  19   M  75000
#> 4.2  19   M  75000
#> 4.3  19   M  75000

Created on 2019-06-28 by the reprex package (v0.2.1)

dylanjm
  • 2,011
  • 9
  • 21