R: Create duplicate rows based on a variable (dplyr preferred)

Question

I'd like to create a new list with duplicate entries based upon an existing list in R. I'm trying to use tidyverse as much as possible, so dplyr would be preferred.

Say I have a list of times where sales occured:

df <- data.frame(time = c(0,1,2,3,4,5), sales = c(1,1,2,1,1,3))

> df
  time sales
1    0     1
2    1     1
3    2     2
4    3     1
5    4     1
6    5     3

And I'd like instead to have a list with an entry for each sale:

ans <- data.frame(salesTime = c(0,1,2,2,3,4,5,5,5))

> ans
  salesTime
1         0
2         1
3         2
4         2
5         3
6         4
7         5
8         5
9         5

I found an interesting example using dplyr here: Create duplicate rows based on conditions in R

But this will only allow me to create one new row when sales == n, and not create n new rows when sales == n.

Any help would be greatly appreciated.

tmfmnk · Accepted Answer · 2020-05-22T12:54:46.380

25

A nice tidyr function for this is uncount():

df %>%
  uncount(sales) %>%
  rename(salesTime = time)

    salesTime
1           0
2           1
3           2
3.1         2
4           3
5           4
6           5
6.1         5
6.2         5

edited May 22 '20 at 12:54

answered Sep 25 '18 at 12:47

tmfmnk

38,881
4
47
67

3

I really like this one, I was totally unaware of tidyr::uncount! – colton Sep 25 '18 at 13:03
Great example! I was also unaware of `uncount()`. If you have a data set with only one row, you can duplicate it to 10 rows with `df |> uncount(10)`. Awesome! – MS Berends Nov 18 '22 at 12:22

Andre Elrico · Answer 2 · 2018-09-25T12:41:16.043

4

data.frame(salesTime = rep(df$time, df$sales))

#  salesTime
#1         0
#2         1
#3         2
#4         2
#5         3
#6         4
#7         5
#8         5
#9         5

If you like dplyr and pipes you can go for:

df %>% {data.frame(salesTime = rep(.$time, .$sales))}

edited Sep 25 '18 at 12:41

answered Sep 25 '18 at 12:35

Andre Elrico

10,956
6
50
69

Thank you! Very clear. I knew I was overcomplicating things.... – colton Sep 25 '18 at 12:36
@colton I don't a good dplyr solution exists here because your "new col" is longer than the original data. – Andre Elrico Sep 25 '18 at 12:37
@Andre Elrico Perhaps a "not good" one? – Nicolas2 Sep 25 '18 at 12:42

score 2 · Answer 3 · answered Sep 25 '18 at 12:41

2

df %>% rowwise %>% mutate(time=list(rep(time,sales))) %>% unnest
## A tibble: 9 x 2
#  sales  time
#  <dbl> <dbl>
#1     1     0
#2     1     1
#3     2     2
#4     2     2
#5     1     3
#6     1     4
#7     3     5
#8     3     5
#9     3     5

answered Sep 25 '18 at 12:41

Nicolas2

2,170
1
6
15

R: Create duplicate rows based on a variable (dplyr preferred)

3 Answers3