107

I have the following data frame:

data.frame(a = c(1,2,3),b = c(1,2,3))
  a b
1 1 1
2 2 2
3 3 3

I want to repeat the rows n times. For example, here the rows are repeated 3 times:

  a b
1 1 1
2 2 2
3 3 3
4 1 1
5 2 2
6 3 3
7 1 1
8 2 2
9 3 3

Is there an easy function to do this in R? Thanks!

Henrik
  • 65,555
  • 14
  • 143
  • 159
Michael
  • 7,087
  • 21
  • 52
  • 81

11 Answers11

187

EDIT: updated to a better modern R answer.

You can use replicate(), then rbind the result back together. The rownames are automatically altered to run from 1:nrows.

d <- data.frame(a = c(1,2,3),b = c(1,2,3))
n <- 3
do.call("rbind", replicate(n, d, simplify = FALSE))

A more traditional way is to use indexing, but here the rowname altering is not quite so neat (but more informative):

 d[rep(seq_len(nrow(d)), n), ]

Here are improvements on the above, the first two using purrr functional programming, idiomatic purrr:

purrr::map_dfr(seq_len(3), ~d)

and less idiomatic purrr (identical result, though more awkward):

purrr::map_dfr(seq_len(3), function(x) d)

and finally via indexing rather than list apply using dplyr:

d %>% slice(rep(row_number(), 3))
mdsumner
  • 29,099
  • 6
  • 83
  • 91
49

For data.frame objects, this solution is several times faster than @mdsummer's and @wojciech-sobala's.

d[rep(seq_len(nrow(d)), n), ]

For data.table objects, @mdsummer's is a bit faster than applying the above after converting to data.frame. For large n this might flip. microbenchmark.

Full code:

packages <- c("data.table", "ggplot2", "RUnit", "microbenchmark")
lapply(packages, require, character.only=T)

Repeat1 <- function(d, n) {
  return(do.call("rbind", replicate(n, d, simplify = FALSE)))
}

Repeat2 <- function(d, n) {
  return(Reduce(rbind, list(d)[rep(1L, times=n)]))
}

Repeat3 <- function(d, n) {
  if ("data.table" %in% class(d)) return(d[rep(seq_len(nrow(d)), n)])
  return(d[rep(seq_len(nrow(d)), n), ])
}

Repeat3.dt.convert <- function(d, n) {
  if ("data.table" %in% class(d)) d <- as.data.frame(d)
  return(d[rep(seq_len(nrow(d)), n), ])
}

# Try with data.frames
mtcars1 <- Repeat1(mtcars, 3)
mtcars2 <- Repeat2(mtcars, 3)
mtcars3 <- Repeat3(mtcars, 3)

checkEquals(mtcars1, mtcars2)
#  Only difference is row.names having ".k" suffix instead of "k" from 1 & 2
checkEquals(mtcars1, mtcars3)

# Works with data.tables too
mtcars.dt <- data.table(mtcars)
mtcars.dt1 <- Repeat1(mtcars.dt, 3)
mtcars.dt2 <- Repeat2(mtcars.dt, 3)
mtcars.dt3 <- Repeat3(mtcars.dt, 3)

# No row.names mismatch since data.tables don't have row.names
checkEquals(mtcars.dt1, mtcars.dt2)
checkEquals(mtcars.dt1, mtcars.dt3)

# Time test
res <- microbenchmark(Repeat1(mtcars, 10),
                      Repeat2(mtcars, 10),
                      Repeat3(mtcars, 10),
                      Repeat1(mtcars.dt, 10),
                      Repeat2(mtcars.dt, 10),
                      Repeat3(mtcars.dt, 10),
                      Repeat3.dt.convert(mtcars.dt, 10))
print(res)
ggsave("repeat_microbenchmark.png", autoplot(res))
Max Ghenis
  • 14,783
  • 16
  • 84
  • 132
21

The package dplyr contains the function bind_rows() that directly combines all data frames in a list, such that there is no need to use do.call() together with rbind():

df <- data.frame(a = c(1, 2, 3), b = c(1, 2, 3))
library(dplyr)
bind_rows(replicate(3, df, simplify = FALSE))

For a large number of repetions bind_rows() is also much faster than rbind():

library(microbenchmark)
microbenchmark(rbind = do.call("rbind", replicate(1000, df, simplify = FALSE)),
               bind_rows = bind_rows(replicate(1000, df, simplify = FALSE)),
               times = 20)
## Unit: milliseconds
##       expr       min        lq      mean   median        uq       max neval cld
##      rbind 31.796100 33.017077 35.436753 34.32861 36.773017 43.556112    20   b
##  bind_rows  1.765956  1.818087  1.881697  1.86207  1.898839  2.321621    20  a 
Stibu
  • 15,166
  • 6
  • 57
  • 71
  • 3
    I guess `slice(rep(row_number(), 3))` is better, per Max's benchmark. Oh, just saw your bench... personally, I'd think scaling up the size of the DF somewhat would be the right direction, rather than the number of tables, but I don't know. – Frank Aug 11 '17 at 15:34
  • 1
    Nice one! When I benchmark it, `slice(df, rep(row_number(), 3))` turns out to be a tiny bit slower than `bind_rows(replicate(...))` (1.9 vs. 2.1 ms). In any case, I thought it was useful to have a `dplyr`-solution as well... – Stibu Aug 11 '17 at 15:42
  • 2
    @Frank You are probably right. I didn't check what happens for large data frames, since I just used the one that was provided in the question. – Stibu Aug 11 '17 at 15:45
14

With the -package, you could use the special symbol .I together with rep:

df <- data.frame(a = c(1,2,3), b = c(1,2,3))
dt <- as.data.table(df)

n <- 3

dt[rep(dt[, .I], n)]

which gives:

   a b
1: 1 1
2: 2 2
3: 3 3
4: 1 1
5: 2 2
6: 3 3
7: 1 1
8: 2 2
9: 3 3
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • Is there a way to use this method to duplicate columnwise? – Stephen Feb 17 '20 at 21:52
  • 1
    @Stephen for a dataframe you could do something like: `df[, rep(seq_along(df), n)]`; for a data.table you could do: `cols <- rep(seq_along(mydf), n); mydf[, ..cols]` – Jaap Feb 18 '20 at 13:32
5
d <- data.frame(a = c(1,2,3),b = c(1,2,3))
r <- Reduce(rbind, list(d)[rep(1L, times=3L)])
Wojciech Sobala
  • 7,431
  • 2
  • 21
  • 27
4

Even simpler:

library(data.table)
my_data <- data.frame(a = c(1,2,3),b = c(1,2,3))
rbindlist(replicate(n = 3, expr = my_data, simplify = FALSE)
Arturo Sbr
  • 5,567
  • 4
  • 38
  • 76
2

Just use simple indexing with repeat function.

mydata<-data.frame(a = c(1,2,3),b = c(1,2,3)) #creating your data frame  
n<-10           #defining no. of time you want repetition of the rows of your dataframe

mydata<-mydata[rep(rownames(mydata),n),] #use rep function while doing indexing 
rownames(mydata)<-1:NROW(mydata)    #rename rows just to get cleaner look of data
1

For time execution purposes, i would like to suggest a comparison of different way of rbind:

> mydata <- data.frame(a=1:200,b=201:400,c=301:500)
> microbenchmark(rbind = do.call("rbind",replicate(n=100,mydata,simplify = FALSE)),
+                bind_rows = bind_rows(replicate(n=100,mydata,simplify = FALSE)),
+                rbindlist = rbindlist(replicate(n=100,exp= mydata,simplify = FALSE)),
+                times= 2000)
Unit: microseconds
      expr    min      lq      mean  median      uq      max neval
     rbind 5760.7 6723.10 8642.6930 7132.30 7761.05 240720.3  2000
 bind_rows  976.4 1186.90 1430.7741 1308.85 1469.80  15817.9  2000
 rbindlist  263.6  347.85  465.5894  392.90  459.95  10974.2  2000
A. chahid
  • 184
  • 2
  • 5
  • This seems like a good contribution, but I think it would be better to copy and paste the actual code and output rather than an image. – Skaqqs Oct 21 '21 at 12:12
1

A simple dplyr method of doing this that allows you to vary the number of replications per row by some other column follows.

> exdf <- data.frame(id = LETTERS[1:6],
+                    blue1 = c(T,T,T,T,T,T),
+                    blue2 = c(T,T,F,F,T,T),
+                    red1 = c(T,F,T,F,T,F),
+                    red2 = c(F,F,T,F,F,F),
+                    n_times = 1:6)
> 
> exdf
  id blue1 blue2  red1  red2 n_times
1  A  TRUE  TRUE  TRUE FALSE       1
2  B  TRUE  TRUE FALSE FALSE       2
3  C  TRUE FALSE  TRUE  TRUE       3
4  D  TRUE FALSE FALSE FALSE       4
5  E  TRUE  TRUE  TRUE FALSE       5
6  F  TRUE  TRUE FALSE FALSE       6
> 
> exdf %>% slice(rep(seq(n()), n_times))
   id blue1 blue2  red1  red2 n_times
1   A  TRUE  TRUE  TRUE FALSE       1
2   B  TRUE  TRUE FALSE FALSE       2
3   B  TRUE  TRUE FALSE FALSE       2
4   C  TRUE FALSE  TRUE  TRUE       3
5   C  TRUE FALSE  TRUE  TRUE       3
6   C  TRUE FALSE  TRUE  TRUE       3
7   D  TRUE FALSE FALSE FALSE       4
8   D  TRUE FALSE FALSE FALSE       4
9   D  TRUE FALSE FALSE FALSE       4
10  D  TRUE FALSE FALSE FALSE       4
11  E  TRUE  TRUE  TRUE FALSE       5
12  E  TRUE  TRUE  TRUE FALSE       5
13  E  TRUE  TRUE  TRUE FALSE       5
14  E  TRUE  TRUE  TRUE FALSE       5
15  E  TRUE  TRUE  TRUE FALSE       5
16  F  TRUE  TRUE FALSE FALSE       6
17  F  TRUE  TRUE FALSE FALSE       6
18  F  TRUE  TRUE FALSE FALSE       6
19  F  TRUE  TRUE FALSE FALSE       6
20  F  TRUE  TRUE FALSE FALSE       6
21  F  TRUE  TRUE FALSE FALSE       6

Of course if you wanted the same value and to skip using "n_times" you could just choose a static number in its place. I think someone else has already demonstrated that... exdf %>% slice(rep(seq(n()), 4)) will duplicate all rows 4 times.

Brandon
  • 1,722
  • 1
  • 19
  • 32
0

You can use tidyr::uncount:

data.frame(a = c(1,2,3),b = c(1,2,3)) %>% 
  tidyr::uncount(3)
  a b
1 1 1
2 1 1
3 1 1
4 2 2
5 2 2
6 2 2
7 3 3
8 3 3
9 3 3
Maël
  • 45,206
  • 3
  • 29
  • 67
0

For data table

    dt[,.SD[rep(.I,n)]]
    dt[,.SD[rep(.I,each=n)]]

For data.frame (some issues with rownames)

    df[rep(1:nrow(df),n),]
    df[rep(1:nrow(df),each=n),]

n number of repetition

mkg
  • 41
  • 4