41

I have a data.frame df and I want that every row in this df is duplicated lengthTime times and that a new column is added that counts from 1 to lengthTime for each row in df.

I know, it sounds pretty complicated, but what I basically want is to apply expand.grid to df. Here is an ugly workaround and I have the feeling that there most be an easier solution (maybe even a base-R function?):

df <- data.frame(ID   = rep(letters[1:3], each=3),
                 CatA = rep(1:3, times = 3),
                 CatB = letters[1:9])
lengthTime <- 3
nrRow <- nrow(df)
intDF <- df
for (i in 1:(lengthTime - 1)) {
  df <- rbind(df, intDF)
}
df$Time <- rep(1:lengthTime, each=nrRow)

I thought that I could just use expand.grid(df, 1:lengthTime), but that does not work. outer did not bring any luck either. So does anyone know a good solution?

Christoph_J
  • 6,804
  • 8
  • 44
  • 58

6 Answers6

65

It's been a while since this question was posted, but I recently came across it looking for just the thing in the title, namely, an expand.grid that works for data frames. The posted answers address the OP's more specific question, so in case anyone is looking for a more general solution for data frames, here's a slightly more general approach:

expand.grid.df <- function(...) Reduce(function(...) merge(..., by=NULL), list(...))

# For the example in the OP
expand.grid.df(df, data.frame(1:lengthTime))

# More generally
df1 <- data.frame(A=1:3, B=11:13)
df2 <- data.frame(C=51:52, D=c("Y", "N"))
df3 <- data.frame(E=c("+", "-"))
expand.grid.df(df1, df2, df3)
ytsaig
  • 3,267
  • 3
  • 23
  • 27
  • Worked really nice ... I used this method for [this SO post](https://stackoverflow.com/q/42112654/10059841), in my solution here: [SO link](https://stackoverflow.com/a/53808294/10059841) – Marian Minar Dec 18 '18 at 02:47
25

You can also just do a simple merge by NULL (which will cause merge to do simple combinatorial data replication):

merge(data.frame(time=1:lengthTime), iris, by=NULL)
Holger Brandl
  • 10,634
  • 3
  • 64
  • 63
  • I like this solution the most but for the sake of simplicity (avoiding dependencies) you could have just wrote `merge(data.frame(time=1:lengthTime), iris, by=NULL)` – jakob-r Nov 07 '18 at 13:05
19

Why not just something like df[rep(1:nrow(df),times = 3),] to extend the data frame, and then add the extra column just as you have above, with df$Time <- rep(1:lengthTime, each=nrRow)?

joran
  • 169,992
  • 32
  • 429
  • 468
13

Quick update

There is now also the crossing() function in package tidyr which can be used instead of merge, is somewhat faster, and returns a tbl_df / tibble.

data.frame(time=1:10) %>% merge(iris, by=NULL) 

data.frame(time=1:10) %>% tidyr::crossing(iris) 
Andrew
  • 733
  • 6
  • 11
2

This works:

REP <- rep(1:nrow(df), 3)
df2 <- data.frame(df[REP, ], Time = rep(1:3, each = 9))
rownames(df2) <- NULL
df2
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • Mine really isn't any different than joran's who beat me by 40 some seconds but I'll leave it as it's slightly more explicit. – Tyler Rinker Jul 27 '12 at 18:44
  • First come, first serve, so I accepted his answer ;-) But +1 for both of you. That's a very neat solution! – Christoph_J Jul 27 '12 at 19:00
2

A data.table solution:

> library(data.table)
>  ( df <- data.frame(ID   = rep(letters[1:3], each=3),
+                  CatA = rep(1:3, times = 3),
+                  CatB = letters[1:9]) )
  ID CatA CatB
1  a    1    a
2  a    2    b
3  a    3    c
4  b    1    d
5  b    2    e
6  b    3    f
7  c    1    g
8  c    2    h
9  c    3    i
> ( DT <- data.table(df)[, lapply(.SD, function(x) rep(x,3))][, Time:=rep(1:3, each=nrow(df0))] )
    ID CatA CatB Time
 1:  a    1    a    1
 2:  a    2    b    1
 3:  a    3    c    1
 4:  b    1    d    1
 5:  b    2    e    1
 6:  b    3    f    1
 7:  c    1    g    1
 8:  c    2    h    1
 9:  c    3    i    1
10:  a    1    a    2
11:  a    2    b    2
12:  a    3    c    2
13:  b    1    d    2
14:  b    2    e    2
15:  b    3    f    2
16:  c    1    g    2
17:  c    2    h    2
18:  c    3    i    2
19:  a    1    a    3
20:  a    2    b    3
21:  a    3    c    3
22:  b    1    d    3
23:  b    2    e    3
24:  b    3    f    3
25:  c    1    g    3
26:  c    2    h    3
27:  c    3    i    3

Another one :

> library(data.table)
>  ( df <- data.frame(ID   = rep(letters[1:3], each=3),
+                  CatA = rep(1:3, times = 3),
+                  CatB = letters[1:9]) )
> DT <- data.table(df)
> rbindlist(lapply(1:3, function(i) cbind(DT, Time=i)))
    ID CatA CatB Time
 1:  a    1    a    1
 2:  a    2    b    1
 3:  a    3    c    1
 4:  b    1    d    1
 5:  b    2    e    1
 6:  b    3    f    1
 7:  c    1    g    1
 8:  c    2    h    1
 9:  c    3    i    1
10:  a    1    a    2
11:  a    2    b    2
12:  a    3    c    2
13:  b    1    d    2
14:  b    2    e    2
15:  b    3    f    2
16:  c    1    g    2
17:  c    2    h    2
18:  c    3    i    2
19:  a    1    a    3
20:  a    2    b    3
21:  a    3    c    3
22:  b    1    d    3
23:  b    2    e    3
24:  b    3    f    3
25:  c    1    g    3
26:  c    2    h    3
27:  c    3    i    3
Stéphane Laurent
  • 75,186
  • 15
  • 119
  • 225