1

Let's say I have the following data frame:

set.seed(1)
df <- data.frame("x" = 1:5, "y" = rnorm(5))

  x          y
1 1 -0.6264538
2 2  0.1836433
3 3 -0.8356286
4 4  1.5952808
5 5  0.3295078

And I want to duplicate each row by as many times as indicated in x, as so:

   x          y
1  1 -0.6264538
2  2  0.1836433
3  2  0.1836433
4  3 -0.8356286
5  3 -0.8356286
6  3 -0.8356286
7  4  1.5952808
8  4  1.5952808
9  4  1.5952808
10 4  1.5952808
11 5  0.3295078
12 5  0.3295078
13 5  0.3295078
14 5  0.3295078
15 5  0.3295078

How would I go about doing that? While my preference is in using a tidyverse solution, I'm open to any other suggestions.

Phil
  • 7,287
  • 3
  • 36
  • 66

5 Answers5

3

We can use rep, to replicate rows of the data frame and the times argument to say how many times to repeat each row.

df[rep(1:nrow(df), times = df$x), ]
    x          y
1   1 -0.6264538
2   2  0.1836433
2.1 2  0.1836433
3   3 -0.8356286
3.1 3 -0.8356286
3.2 3 -0.8356286
4   4  1.5952808
4.1 4  1.5952808
4.2 4  1.5952808
4.3 4  1.5952808
5   5  0.3295078
5.1 5  0.3295078
5.2 5  0.3295078
5.3 5  0.3295078
5.4 5  0.3295078
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • I evidently need to go for a walk, given my inability to consider the simplicity of my problem. Thanks. – Phil Jul 03 '18 at 18:45
  • This is actually very similar to a question I asked years ago, [How to repeat a data frame?](https://stackoverflow.com/q/13275260/903061) I felt the same way when I got the answer. – Gregor Thomas Jul 03 '18 at 18:48
  • 1
    Or another option is `expandRows(df, 'x', drop = FALSE)` – akrun Jul 03 '18 at 18:54
2

Using dplyr:

dplyr::slice(df, rep(1:n(), x))                # as per Sir Gregor's recommendation

OR explicitly

dplyr::slice(df,rep(1:nrow(df), df$x))
Mankind_008
  • 2,158
  • 2
  • 9
  • 15
  • 1
    Aw, but if you're going to use `dplyr`, use the cool `n()` instead of lame old `nrow()`. It saves three characters of typing! ;) – Gregor Thomas Jul 03 '18 at 19:02
0
with(df,df[rep(1:nrow(df),x),])
    x          y
1   1 -0.6264538
2   2  0.1836433
2.1 2  0.1836433
3   3 -0.8356286
3.1 3 -0.8356286
3.2 3 -0.8356286
4   4  1.5952808
4.1 4  1.5952808
4.2 4  1.5952808
4.3 4  1.5952808
5   5  0.3295078
5.1 5  0.3295078
5.2 5  0.3295078
5.3 5  0.3295078
5.4 5  0.3295078
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0
df[ rep(seq_len(nrow(df)), df$x), ]

    x           y
1   1 -1.31142059
2   2 -0.09652492
2.1 2 -0.09652492
3   3  2.36971991
3.1 3  2.36971991
3.2 3  2.36971991
4   4  0.89062648
4.1 4  0.89062648
4.2 4  0.89062648
4.3 4  0.89062648
5   5 -0.25218316
5.1 5 -0.25218316
5.2 5 -0.25218316
5.3 5 -0.25218316
5.4 5 -0.25218316

Looks like several of us got to it at the same time ...

MHammer
  • 1,274
  • 7
  • 12
0

I've recently discovered dplyr::uncount() which would work just as well:

dplyr::uncount(df, x)
Phil
  • 7,287
  • 3
  • 36
  • 66