Duplicating rows of a dataframe based on another vector in R

Question

Let's say I have the following data frame:

set.seed(1)
df <- data.frame("x" = 1:5, "y" = rnorm(5))

  x          y
1 1 -0.6264538
2 2  0.1836433
3 3 -0.8356286
4 4  1.5952808
5 5  0.3295078

And I want to duplicate each row by as many times as indicated in x, as so:

   x          y
1  1 -0.6264538
2  2  0.1836433
3  2  0.1836433
4  3 -0.8356286
5  3 -0.8356286
6  3 -0.8356286
7  4  1.5952808
8  4  1.5952808
9  4  1.5952808
10 4  1.5952808
11 5  0.3295078
12 5  0.3295078
13 5  0.3295078
14 5  0.3295078
15 5  0.3295078

How would I go about doing that? While my preference is in using a tidyverse solution, I'm open to any other suggestions.

score 3 · Accepted Answer · answered Jul 03 '18 at 18:44

3

We can use rep, to replicate rows of the data frame and the times argument to say how many times to repeat each row.

df[rep(1:nrow(df), times = df$x), ]
    x          y
1   1 -0.6264538
2   2  0.1836433
2.1 2  0.1836433
3   3 -0.8356286
3.1 3 -0.8356286
3.2 3 -0.8356286
4   4  1.5952808
4.1 4  1.5952808
4.2 4  1.5952808
4.3 4  1.5952808
5   5  0.3295078
5.1 5  0.3295078
5.2 5  0.3295078
5.3 5  0.3295078
5.4 5  0.3295078

answered Jul 03 '18 at 18:44

Gregor Thomas

136,190
20
167
294

I evidently need to go for a walk, given my inability to consider the simplicity of my problem. Thanks. – Phil Jul 03 '18 at 18:45
This is actually very similar to a question I asked years ago, [How to repeat a data frame?](https://stackoverflow.com/q/13275260/903061) I felt the same way when I got the answer. – Gregor Thomas Jul 03 '18 at 18:48
1

Or another option is `expandRows(df, 'x', drop = FALSE)` – akrun Jul 03 '18 at 18:54

Mankind_008 · Answer 2 · 2018-07-03T19:03:30.107

2

Using dplyr:

dplyr::slice(df, rep(1:n(), x))                # as per Sir Gregor's recommendation

OR explicitly

dplyr::slice(df,rep(1:nrow(df), df$x))

edited Jul 03 '18 at 19:03

answered Jul 03 '18 at 18:48

Mankind_008

2,158
2
9
15

1

Aw, but if you're going to use `dplyr`, use the cool `n()` instead of lame old `nrow()`. It saves three characters of typing! ;) – Gregor Thomas Jul 03 '18 at 19:02

Onyambu · Answer 3 · 2018-07-03T18:46:53.887

0

with(df,df[rep(1:nrow(df),x),])
    x          y
1   1 -0.6264538
2   2  0.1836433
2.1 2  0.1836433
3   3 -0.8356286
3.1 3 -0.8356286
3.2 3 -0.8356286
4   4  1.5952808
4.1 4  1.5952808
4.2 4  1.5952808
4.3 4  1.5952808
5   5  0.3295078
5.1 5  0.3295078
5.2 5  0.3295078
5.3 5  0.3295078
5.4 5  0.3295078

edited Jul 03 '18 at 18:46

answered Jul 03 '18 at 18:44

Onyambu

67,392
3
24
53

score 0 · Answer 4 · answered Jul 03 '18 at 18:47

df[ rep(seq_len(nrow(df)), df$x), ]

    x           y
1   1 -1.31142059
2   2 -0.09652492
2.1 2 -0.09652492
3   3  2.36971991
3.1 3  2.36971991
3.2 3  2.36971991
4   4  0.89062648
4.1 4  0.89062648
4.2 4  0.89062648
4.3 4  0.89062648
5   5 -0.25218316
5.1 5 -0.25218316
5.2 5 -0.25218316
5.3 5 -0.25218316
5.4 5 -0.25218316

Looks like several of us got to it at the same time ...

score 0 · Answer 5 · answered Aug 31 '18 at 05:18

0

I've recently discovered dplyr::uncount() which would work just as well:

dplyr::uncount(df, x)

answered Aug 31 '18 at 05:18

Phil

7,287
3
36
66

Duplicating rows of a dataframe based on another vector in R

5 Answers5