2

I have this group-level data.frame:

group_table <- 
  data.frame(where = c("01", "01", "01"),
             age = c(45, 46, 47),
             males = c(5, 2, 2),
             females = c(2, 3, 3))

#   where age males females
# 1    01  45     5       2
# 2    01  46     2       3
# 3    01  47     2       3

My task now is relatively simple: for each male and female grouped in this table, I want a single row in an individual-level table. The first five rows of the table should look like this:

#   where age gender
# 1    01  45   male
# 2    01  45   male
# 3    01  45   male
# 4    01  45   male
# 5    01  45   male

Is there an elegant and efficient way to do it? (The resulting table will have millions of rows).

CptNemo
  • 6,455
  • 16
  • 58
  • 107

2 Answers2

2

Bring count of males and females in one column using pivot_longer and use uncount to repeat rows based on count.

library(tidyr)
group_table %>% pivot_longer(cols = c(males, females)) %>% uncount(value)

# A tibble: 17 x 3
#   where   age name   
#   <chr> <dbl> <chr>  
# 1 01       45 males  
# 2 01       45 males  
# 3 01       45 males  
# 4 01       45 males  
# 5 01       45 males  
# 6 01       45 females
# 7 01       45 females
# 8 01       46 males  
# 9 01       46 males  
#10 01       46 females
#11 01       46 females
#12 01       46 females
#13 01       47 males  
#14 01       47 males  
#15 01       47 females
#16 01       47 females
#17 01       47 females
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • That's great. Thanks! Can you explain what `uncount(value)` actually does? – CptNemo Jun 23 '20 at 05:01
  • 1
    If you look at output from `group_table %>% pivot_longer(cols = c(males, females))` you see column `value`, `uncount` repeats rows that number of times. So first row is repeated 5 times, second row 2 times and so on. – Ronak Shah Jun 23 '20 at 05:04
1

We can use melt from data.table

library(data.table)
melt(setDT(group_table), measure = c('males', 'females'),
      variable.name = 'gender')[rep(seq_len(.N), value)][, value := NULL][]
#     where age  gender
# 1:    01  45   males
# 2:    01  45   males
# 3:    01  45   males
# 4:    01  45   males
# 5:    01  45   males
# 6:    01  46   males
# 7:    01  46   males
# 8:    01  47   males
# 9:    01  47   males
#10:    01  45 females
#11:    01  45 females
#12:    01  46 females
#13:    01  46 females
#14:    01  46 females
#15:    01  47 females
#16:    01  47 females
#17:    01  47 females
akrun
  • 874,273
  • 37
  • 540
  • 662