Repeat rows of a data.frame N times

Question

I have the following data frame:

data.frame(a = c(1,2,3),b = c(1,2,3))
  a b
1 1 1
2 2 2
3 3 3

I want to repeat the rows n times. For example, here the rows are repeated 3 times:

Is there an easy function to do this in R? Thanks!

mdsumner · Accepted Answer · 2019-06-26T23:51:57.923

187

EDIT: updated to a better modern R answer.

You can use replicate(), then rbind the result back together. The rownames are automatically altered to run from 1:nrows.

d <- data.frame(a = c(1,2,3),b = c(1,2,3))
n <- 3
do.call("rbind", replicate(n, d, simplify = FALSE))

A more traditional way is to use indexing, but here the rowname altering is not quite so neat (but more informative):

 d[rep(seq_len(nrow(d)), n), ]

Here are improvements on the above, the first two using purrr functional programming, idiomatic purrr:

purrr::map_dfr(seq_len(3), ~d)

and less idiomatic purrr (identical result, though more awkward):

purrr::map_dfr(seq_len(3), function(x) d)

and finally via indexing rather than list apply using dplyr:

d %>% slice(rep(row_number(), 3))

edited Jun 26 '19 at 23:51

answered Jan 06 '12 at 05:23

mdsumner

29,099
6
83
91

5

Beware zero row data frames. seq_len is probably a better option – hadley Jan 06 '12 at 09:17
1

Thanks, I vagued out on that (I always think it's seq_along and wasn't putting in the effort). I appreciate the heads up. – mdsumner Jan 06 '12 at 13:34
2

tidyr::expand and tidyr::uncount are also good options – Arthur Yip Oct 21 '20 at 18:42

Max Ghenis · Answer 2 · 2018-08-23T15:44:20.767

For data.frame objects, this solution is several times faster than @mdsummer's and @wojciech-sobala's.

d[rep(seq_len(nrow(d)), n), ]

For data.table objects, @mdsummer's is a bit faster than applying the above after converting to data.frame. For large n this might flip. microbenchmark .

Full code:

packages <- c("data.table", "ggplot2", "RUnit", "microbenchmark")
lapply(packages, require, character.only=T)

Repeat1 <- function(d, n) {
  return(do.call("rbind", replicate(n, d, simplify = FALSE)))
}

Repeat2 <- function(d, n) {
  return(Reduce(rbind, list(d)[rep(1L, times=n)]))
}

Repeat3 <- function(d, n) {
  if ("data.table" %in% class(d)) return(d[rep(seq_len(nrow(d)), n)])
  return(d[rep(seq_len(nrow(d)), n), ])
}

Repeat3.dt.convert <- function(d, n) {
  if ("data.table" %in% class(d)) d <- as.data.frame(d)
  return(d[rep(seq_len(nrow(d)), n), ])
}

# Try with data.frames
mtcars1 <- Repeat1(mtcars, 3)
mtcars2 <- Repeat2(mtcars, 3)
mtcars3 <- Repeat3(mtcars, 3)

checkEquals(mtcars1, mtcars2)
#  Only difference is row.names having ".k" suffix instead of "k" from 1 & 2
checkEquals(mtcars1, mtcars3)

# Works with data.tables too
mtcars.dt <- data.table(mtcars)
mtcars.dt1 <- Repeat1(mtcars.dt, 3)
mtcars.dt2 <- Repeat2(mtcars.dt, 3)
mtcars.dt3 <- Repeat3(mtcars.dt, 3)

# No row.names mismatch since data.tables don't have row.names
checkEquals(mtcars.dt1, mtcars.dt2)
checkEquals(mtcars.dt1, mtcars.dt3)

# Time test
res <- microbenchmark(Repeat1(mtcars, 10),
                      Repeat2(mtcars, 10),
                      Repeat3(mtcars, 10),
                      Repeat1(mtcars.dt, 10),
                      Repeat2(mtcars.dt, 10),
                      Repeat3(mtcars.dt, 10),
                      Repeat3.dt.convert(mtcars.dt, 10))
print(res)
ggsave("repeat_microbenchmark.png", autoplot(res))

Stibu · Answer 3 · 2017-08-18T06:37:57.640

21

The package dplyr contains the function bind_rows() that directly combines all data frames in a list, such that there is no need to use do.call() together with rbind():

df <- data.frame(a = c(1, 2, 3), b = c(1, 2, 3))
library(dplyr)
bind_rows(replicate(3, df, simplify = FALSE))

For a large number of repetions bind_rows() is also much faster than rbind():

library(microbenchmark)
microbenchmark(rbind = do.call("rbind", replicate(1000, df, simplify = FALSE)),
               bind_rows = bind_rows(replicate(1000, df, simplify = FALSE)),
               times = 20)
## Unit: milliseconds
##       expr       min        lq      mean   median        uq       max neval cld
##      rbind 31.796100 33.017077 35.436753 34.32861 36.773017 43.556112    20   b
##  bind_rows  1.765956  1.818087  1.881697  1.86207  1.898839  2.321621    20  a

edited Aug 18 '17 at 06:37

answered Aug 11 '17 at 15:30

Stibu

15,166
6
57
71

3

I guess `slice(rep(row_number(), 3))` is better, per Max's benchmark. Oh, just saw your bench... personally, I'd think scaling up the size of the DF somewhat would be the right direction, rather than the number of tables, but I don't know. – Frank Aug 11 '17 at 15:34
1

Nice one! When I benchmark it, `slice(df, rep(row_number(), 3))` turns out to be a tiny bit slower than `bind_rows(replicate(...))` (1.9 vs. 2.1 ms). In any case, I thought it was useful to have a `dplyr`-solution as well... – Stibu Aug 11 '17 at 15:42
2

@Frank You are probably right. I didn't check what happens for large data frames, since I just used the one that was provided in the question. – Stibu Aug 11 '17 at 15:45

score 14 · Answer 4 · answered Sep 13 '19 at 08:10

14

With the data.table-package, you could use the special symbol .I together with rep:

df <- data.frame(a = c(1,2,3), b = c(1,2,3))
dt <- as.data.table(df)

n <- 3

dt[rep(dt[, .I], n)]

which gives:

answered Sep 13 '19 at 08:10

Jaap

81,064
34
182
193

Is there a way to use this method to duplicate columnwise? – Stephen Feb 17 '20 at 21:52
1

@Stephen for a dataframe you could do something like: `df[, rep(seq_along(df), n)]`; for a data.table you could do: `cols <- rep(seq_along(mydf), n); mydf[, ..cols]` – Jaap Feb 18 '20 at 13:32

score 5 · Answer 5 · answered Jan 06 '12 at 19:34

5

d <- data.frame(a = c(1,2,3),b = c(1,2,3))
r <- Reduce(rbind, list(d)[rep(1L, times=3L)])

answered Jan 06 '12 at 19:34

Wojciech Sobala

7,431
2
21
27

4

Care to elaborate what you just did and how it compares to mdsumner's answer? Perhaps paste in some results? – Roman Luštrik Jan 07 '12 at 01:28

Arturo Sbr · Answer 6 · 2019-08-22T15:46:03.827

4

Even simpler:

library(data.table)
my_data <- data.frame(a = c(1,2,3),b = c(1,2,3))
rbindlist(replicate(n = 3, expr = my_data, simplify = FALSE)

edited Aug 22 '19 at 15:46

answered Feb 20 '19 at 21:16

Arturo Sbr

5,567
4
38
76

1

From `data.table` package – Mostafa90 Aug 22 '19 at 14:42

score 2 · Answer 7 · answered Apr 01 '16 at 11:22

2

Just use simple indexing with repeat function.

mydata<-data.frame(a = c(1,2,3),b = c(1,2,3)) #creating your data frame  
n<-10           #defining no. of time you want repetition of the rows of your dataframe

mydata<-mydata[rep(rownames(mydata),n),] #use rep function while doing indexing 
rownames(mydata)<-1:NROW(mydata)    #rename rows just to get cleaner look of data

answered Apr 01 '16 at 11:22

i guess that this is the same @Max Ghenis solution – Simon C. Jan 07 '20 at 17:47

A. chahid · Answer 8 · 2021-10-25T13:59:51.357

For time execution purposes, i would like to suggest a comparison of different way of rbind:

> mydata <- data.frame(a=1:200,b=201:400,c=301:500)
> microbenchmark(rbind = do.call("rbind",replicate(n=100,mydata,simplify = FALSE)),
+                bind_rows = bind_rows(replicate(n=100,mydata,simplify = FALSE)),
+                rbindlist = rbindlist(replicate(n=100,exp= mydata,simplify = FALSE)),
+                times= 2000)
Unit: microseconds
      expr    min      lq      mean  median      uq      max neval
     rbind 5760.7 6723.10 8642.6930 7132.30 7761.05 240720.3  2000
 bind_rows  976.4 1186.90 1430.7741 1308.85 1469.80  15817.9  2000
 rbindlist  263.6  347.85  465.5894  392.90  459.95  10974.2  2000

This seems like a good contribution, but I think it would be better to copy and paste the actual code and output rather than an image. — Skaqqs, Oct 21 '21 at 12:12

score 1 · Answer 9 · answered Apr 07 '23 at 22:04

A simple dplyr method of doing this that allows you to vary the number of replications per row by some other column follows.

> exdf <- data.frame(id = LETTERS[1:6],
+                    blue1 = c(T,T,T,T,T,T),
+                    blue2 = c(T,T,F,F,T,T),
+                    red1 = c(T,F,T,F,T,F),
+                    red2 = c(F,F,T,F,F,F),
+                    n_times = 1:6)
> 
> exdf
  id blue1 blue2  red1  red2 n_times
1  A  TRUE  TRUE  TRUE FALSE       1
2  B  TRUE  TRUE FALSE FALSE       2
3  C  TRUE FALSE  TRUE  TRUE       3
4  D  TRUE FALSE FALSE FALSE       4
5  E  TRUE  TRUE  TRUE FALSE       5
6  F  TRUE  TRUE FALSE FALSE       6
> 
> exdf %>% slice(rep(seq(n()), n_times))
   id blue1 blue2  red1  red2 n_times
1   A  TRUE  TRUE  TRUE FALSE       1
2   B  TRUE  TRUE FALSE FALSE       2
3   B  TRUE  TRUE FALSE FALSE       2
4   C  TRUE FALSE  TRUE  TRUE       3
5   C  TRUE FALSE  TRUE  TRUE       3
6   C  TRUE FALSE  TRUE  TRUE       3
7   D  TRUE FALSE FALSE FALSE       4
8   D  TRUE FALSE FALSE FALSE       4
9   D  TRUE FALSE FALSE FALSE       4
10  D  TRUE FALSE FALSE FALSE       4
11  E  TRUE  TRUE  TRUE FALSE       5
12  E  TRUE  TRUE  TRUE FALSE       5
13  E  TRUE  TRUE  TRUE FALSE       5
14  E  TRUE  TRUE  TRUE FALSE       5
15  E  TRUE  TRUE  TRUE FALSE       5
16  F  TRUE  TRUE FALSE FALSE       6
17  F  TRUE  TRUE FALSE FALSE       6
18  F  TRUE  TRUE FALSE FALSE       6
19  F  TRUE  TRUE FALSE FALSE       6
20  F  TRUE  TRUE FALSE FALSE       6
21  F  TRUE  TRUE FALSE FALSE       6

Of course if you wanted the same value and to skip using "n_times" you could just choose a static number in its place. I think someone else has already demonstrated that... exdf %>% slice(rep(seq(n()), 4)) will duplicate all rows 4 times.

score 0 · Answer 10 · answered Aug 30 '22 at 08:44

0

You can use tidyr::uncount:

data.frame(a = c(1,2,3),b = c(1,2,3)) %>% 
  tidyr::uncount(3)

answered Aug 30 '22 at 08:44

Maël

45,206
3
29
67

score 0 · Answer 11 · answered Feb 12 '23 at 21:33

0

For data table

    dt[,.SD[rep(.I,n)]]
    dt[,.SD[rep(.I,each=n)]]

For data.frame (some issues with rownames)

    df[rep(1:nrow(df),n),]
    df[rep(1:nrow(df),each=n),]

n number of repetition

answered Feb 12 '23 at 21:33

mkg

41
4

Repeat rows of a data.frame N times

11 Answers11

Linked

Related