1

I have this dataframe:

df <- data.frame(
  ID = 1:5,
  Subject = c("A","A","B","B","C"),
  Duration = c(3,2,2,4,5)
)

The task is straightforward: I need to increase the number of rows by the vector in column Duration. That is, for example, Durationin row #1 is 3; so this row should be triplicated. Duration in row #2 is 2; so this row should be duplicated, and so on. How can this be done?

Expected:

  ID Subject Duration
1  1       A        3
2  1       A        3
3  1       A        3
4  2       A        2
5  2       A        2
6  3       B        2
7  3       B        2
8  4       B        4
9  4       B        4
10 4       B        4
11 4       B        4
12 5       C        5
13 5       C        5
14 5       C        5
15 5       C        5
16 5       C        5

I'm grateful for any solution, particularly for a dplyr one.

Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • 1
    this post may be helpful: https://stackoverflow.com/questions/2894775/repeat-each-row-of-data-frame-the-number-of-times-specified-in-a-column – Mel G May 08 '22 at 15:21

2 Answers2

3

The function you need is tidyr::uncount.

library(tidyr)

uncount(df, Duration, .remove = F)

   ID Subject Duration
1   1       A        3
2   1       A        3
3   1       A        3
4   2       A        2
5   2       A        2
6   3       B        2
7   3       B        2
8   4       B        4
9   4       B        4
10  4       B        4
11  4       B        4
12  5       C        5
13  5       C        5
14  5       C        5
15  5       C        5
16  5       C        5
benson23
  • 16,369
  • 9
  • 19
  • 38
2

We could use slice:

library(dplyr)
df %>% 
  slice(rep(row_number(), Duration))
   ID Subject Duration
1   1       A        3
2   1       A        3
3   1       A        3
4   2       A        2
5   2       A        2
6   3       B        2
7   3       B        2
8   4       B        4
9   4       B        4
10  4       B        4
11  4       B        4
12  5       C        5
13  5       C        5
14  5       C        5
15  5       C        5
16  5       C        5
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 1
    Thanks for this. Quick question: in the actual data, total `Duration`is exactly the same for all `Subject`s, namely 5000. However, your method produces 4999 rows for one participant. ANy idea why this error occurs and, perhaps, how to fix it? – Chris Ruehlemann May 08 '22 at 17:30
  • I am not sure. But maybe we could try `slice(rep(1:nrow(df), Duration))`. And maybe adapt the one to 0. – TarJae May 08 '22 at 17:35