r create observations from frequency counts

Question

I have frequency counts based on three variables y , Col1, Col2 as shown below

     Col1    Col2      y       n
     Good    Poor      0       0
     Good    Poor      1       0
     Good    Rich      1       13
     Good    Rich      0       8
     Bad     Poor      0       8
     Bad     Poor      1       0
     Bad     Rich      1       15
     Bad     Rich      0       5

How do I expand this table such that the dataset has number of rows, as indicated in column n for combination of responses in Col1, Col2 & y ?

For example the dataset should have 13 rows of Col1=Good, Col2=Rich, y=1, 8 rows of Col1=Good, Col2=Rich, y=0 so on.

Try `i<-rep(row.names(df1), df1$n); df2 <- df1[i,]`. This will remove rows with counts equal to zero. — Rui Barradas, Feb 19 '23 at 06:18

score 1 · Answer 1 · answered Feb 19 '23 at 06:27

Use rep to repeat the row names and subset with its result.
In the first example below I explicitly create an index i, in the second a one-liner solves the problem.

Also, in the first example the output duplicates (as asked for) rows and the row names show which rows are duplicates of which. In the second example by setting the row names to NULL they are recreated to become consecutive numbers starting at 1.

df1 <- "Col1    Col2      y       n
     Good    Poor      0       0
     Good    Poor      1       0
     Good    Rich      1       13
     Good    Rich      0       8
     Bad     Poor      0       8
     Bad     Poor      1       0
     Bad     Rich      1       15
     Bad     Rich      0       5"
df1 <- read.table(text = df1, header = TRUE)

i <- rep(row.names(df1), df1$n)
df2 <- df1[i, ]
head(df2)
#>     Col1 Col2 y  n
#> 3   Good Rich 1 13
#> 3.1 Good Rich 1 13
#> 3.2 Good Rich 1 13
#> 3.3 Good Rich 1 13
#> 3.4 Good Rich 1 13
#> 3.5 Good Rich 1 13

df2 <- df1[rep(row.names(df1), df1$n), ]
row.names(df2) <- NULL
head(df2)
#>   Col1 Col2 y  n
#> 1 Good Rich 1 13
#> 2 Good Rich 1 13
#> 3 Good Rich 1 13
#> 4 Good Rich 1 13
#> 5 Good Rich 1 13
#> 6 Good Rich 1 13

^{Created on 2023-02-19 with reprex v2.0.2}

score 1 · Accepted Answer · answered Feb 19 '23 at 06:33

You could use uncount:

tidyr::uncount(df,n)

   Col1 Col2 y
1  Good Rich 1
2  Good Rich 1
3  Good Rich 1
4  Good Rich 1
5  Good Rich 1
6  Good Rich 1
7  Good Rich 1
8  Good Rich 1
9  Good Rich 1
:   :    :   :
:   :    :   :

The question is why do you need this? You do realize you can still analyze the data the way it is before the counts. What if there were millions of counts for each row? It will not be wise to uncount the data.

score 1 · Answer 3 · answered Feb 19 '23 at 06:39

Here is an alternative using expandRows function from splitstackshape package:

library(splitstackshape)

expandRows(df, "n")

   Col1 Col2 y
3    Good Rich 1
3.1  Good Rich 1
3.2  Good Rich 1
3.3  Good Rich 1
3.4  Good Rich 1
3.5  Good Rich 1
3.6  Good Rich 1
3.7  Good Rich 1
3.8  Good Rich 1
....

r create observations from frequency counts

3 Answers3