Repeat rows of a data.frame

Question

I want to repeat the rows of a data.frame, each N times. The result should be a new data.frame (with nrow(new.df) == nrow(old.df) * N) keeping the data types of the columns.

Example for N = 2:

                        A B   C
  A B   C             1 j i 100
1 j i 100     -->     2 j i 100
2 K P 101             3 K P 101
                      4 K P 101

So, each row is repeated 2 times and characters remain characters, factors remain factors, numerics remain numerics, ...

My first attempt used apply: apply(old.df, 2, function(co) rep(co, each = N)), but this one transforms my values to characters and I get:

     A   B   C    
[1,] "j" "i" "100"
[2,] "j" "i" "100"
[3,] "K" "P" "101"
[4,] "K" "P" "101"

Josh O'Brien · Accepted Answer · 2019-10-15T15:04:32.067

174

df <- data.frame(a = 1:2, b = letters[1:2]) 
df[rep(seq_len(nrow(df)), each = 2), ]

edited Oct 15 '19 at 15:04

answered Jun 20 '12 at 14:09

Josh O'Brien

159,210
26
366
455

27

You can use `n.times <- c(2,4) ; df[rep(seq_len(nrow(df)), n.times),]` if you want to vary the number of times each line is repeated. – Mark Miller Feb 25 '14 at 23:51

David Rubinger · Answer 2 · 2019-08-06T12:10:05.370

94

A clean dplyr solution, taken from here

library(dplyr)
df <- tibble(x = 1:2, y = c("a", "b"))
df %>% slice(rep(1:n(), each = 2))

edited Aug 06 '19 at 12:10

answered Dec 12 '17 at 19:53

David Rubinger

3,580
1
20
29

4

This is the preferable solution imo because it works cleanly in a pipe. – Dan Villarreal May 07 '20 at 19:36

Adam Erickson · Answer 3 · 2018-09-13T15:30:10.583

There is a lovely vectorized solution that repeats only certain rows n-times each, possible for example by adding an ntimes column to your data frame:

  A B   C ntimes
1 j i 100      2
2 K P 101      4
3 Z Z 102      1

Method:

df <- data.frame(A=c("j","K","Z"), B=c("i","P","Z"), C=c(100,101,102), ntimes=c(2,4,1))
df <- as.data.frame(lapply(df, rep, df$ntimes))

Result:

  A B   C ntimes
1 Z Z 102      1
2 j i 100      2
3 j i 100      2
4 K P 101      4
5 K P 101      4
6 K P 101      4
7 K P 101      4

This is very similar to Josh O'Brien and Mark Miller's method:

df[rep(seq_len(nrow(df)), df$ntimes),]

However, that method appears quite a bit slower:

df <- data.frame(A=c("j","K","Z"), B=c("i","P","Z"), C=c(100,101,102), ntimes=c(2000,3000,4000))

microbenchmark::microbenchmark(
  df[rep(seq_len(nrow(df)), df$ntimes),],
  as.data.frame(lapply(df, rep, df$ntimes)),
  times = 10
)

Result:

Unit: microseconds
                                      expr      min       lq      mean   median       uq      max neval
   df[rep(seq_len(nrow(df)), df$ntimes), ] 3563.113 3586.873 3683.7790 3613.702 3657.063 4326.757    10
 as.data.frame(lapply(df, rep, df$ntimes))  625.552  654.638  676.4067  668.094  681.929  799.893    10

I think that this is the most versatile solution, as it allows you to assign different number of replications per line! I am curious, is there a way to do this in tidyverse? — TCS, Aug 10 '21 at 02:52
@TCS see my answer here: https://stackoverflow.com/a/75962253/2934203 — Brandon, Apr 07 '23 at 22:06

score 12 · Answer 4 · edited May 23 '17 at 12:26

12

If you can repeat the whole thing, or subset it first then repeat that, then this similar question may be helpful. Once again:

library(mefa)
rep(mtcars,10)

or simply

mefa:::rep.data.frame(mtcars)

edited May 23 '17 at 12:26

Community

1
1

answered Apr 24 '13 at 22:20

dardisco

5,086
2
39
54

16

Aha! Another brilliant R function hidden deep inside an obcure specialist package with a totally unrelated name. I love this language! – smci May 20 '14 at 02:20

score 9 · Answer 5 · answered May 20 '14 at 02:23

Adding to what @dardisco mentioned about mefa::rep.data.frame(), it's very flexible.

You can either repeat each row N times:

rep(df, each=N)

or repeat the entire dataframe N times (think: like when you recycle a vectorized argument)

rep(df, times=N)

Two thumbs up for mefa! I had never heard of it until now and I had to write manual code to do this.

score 7 · Answer 6 · answered Jul 21 '15 at 18:53

For reference and adding to answers citing mefa, it might worth to take a look on the implementation of mefa::rep.data.frame() in case you don't want to include the whole package:

> data <- data.frame(a=letters[1:3], b=letters[4:6])
> data
  a b
1 a d
2 b e
3 c f
> as.data.frame(lapply(data, rep, 2))
  a b
1 a d
2 b e
3 c f
4 a d
5 b e
6 c f

score 5 · Answer 7 · answered May 30 '13 at 18:31

5

The rep.row function seems to sometimes make lists for columns, which leads to bad memory hijinks. I have written the following which seems to work well:

library(plyr)
rep.row <- function(r, n){
  colwise(function(x) rep(x, n))(r)
}

answered May 30 '13 at 18:31

jebyrnes

9,082
5
30
33

Artem Klevtsov · Answer 8 · 2016-03-01T19:18:43.023

My solution similar as mefa:::rep.data.frame, but a little faster and cares about row names:

rep.data.frame <- function(x, times) {
    rnames <- attr(x, "row.names")
    x <- lapply(x, rep.int, times = times)
    class(x) <- "data.frame"
    if (!is.numeric(rnames))
        attr(x, "row.names") <- make.unique(rep.int(rnames, times))
    else
        attr(x, "row.names") <- .set_row_names(length(rnames) * times)
    x
}

Compare solutions:

library(Lahman)
library(microbenchmark)
microbenchmark(
    mefa:::rep.data.frame(Batting, 10),
    rep.data.frame(Batting, 10),
    Batting[rep.int(seq_len(nrow(Batting)), 10), ],
    times = 10
)
#> Unit: milliseconds
#>                                            expr       min       lq     mean   median        uq       max neval cld
#>              mefa:::rep.data.frame(Batting, 10) 127.77786 135.3480 198.0240 148.1749  278.1066  356.3210    10  a 
#>                     rep.data.frame(Batting, 10)  79.70335  82.8165 134.0974  87.2587  191.1713  307.4567    10  a 
#>  Batting[rep.int(seq_len(nrow(Batting)), 10), ] 895.73750 922.7059 981.8891 956.3463 1018.2411 1127.3927    10   b

score 1 · Answer 9 · answered Jun 20 '12 at 14:09

1

try using for example

N=2
rep(1:4, each = N)

as an index

answered Jun 20 '12 at 14:09

shhhhimhuntingrabbits

7,397
2
23
23

score 0 · Answer 10 · answered Jun 03 '15 at 12:07

Another way to do this would to first get row indices, append extra copies of the df, and then order by the indices:

df$index = 1:nrow(df)
df = rbind(df,df)
df = df[order(df$index),][,-ncol(df)]

Although the other solutions may be shorter, this method may be more advantageous in certain situations.

Repeat rows of a data.frame

10 Answers10

Linked

Related