6

I have 2 data.frames with the following columns.

1) A,B,C,D 2) E,F,G,H

What I'd like to do, is create a new data.frame, which has a row for each element of expand.grid(1[,B]2[,F]) and would keep all other columns and values associated with the values of col B and col F from the original data.frames

I am currently doing this using 2 for loops and this is creating a pretty large running time since the data.frames I'm dealing with are rather large.

Here is a screenshot of what I am looking for:

> aa
  A B C D
1 1 x 3 5
2 2 y 4 6
> bb
  E F  G  H
1 7 j  9 11
2 8 k 10 12
> cc
  A B C D E F  G  H
1 1 x 3 5 7 j  9 11
2 2 y 4 6 7 j  9 11
3 1 x 3 5 8 k 10 12
4 2 y 4 6 8 k 10 12
Maksym
  • 63
  • 1
  • 6
  • Are you sure your desired output is correct? In `cc`, you have two sets of identical rows (1, 3) and (2, 4). – davechilders May 06 '15 at 19:10
  • I'm not sure your example actually makes sense with regards to the verbage of your question. You end up with duplicated rows in cc where from your explanation it sounds like you want the values of B and F to act as your keys for those data.frames but themselves be combined through expand.grid? – Forrest R. Stevens May 06 '15 at 19:13
  • Yes you guys are correct, the output I had was wrong. I edited the question, thanks for point this out. – Maksym May 06 '15 at 19:24

4 Answers4

10

I think , you are looking for :

merge(aa,bb)

  A B C D E F  G  H
1 1 x 3 5 7 j  9 11
2 2 y 4 6 7 j  9 11
3 1 x 3 5 8 k 10 12
4 2 y 4 6 8 k 10 12
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • 1
    Because there are no common columns, you're basically just doing a full outer join. Nice. – MrFlick May 06 '15 at 19:13
  • Thank you, I needed to make some alterations to this because of the format of my tables (they actually share some common columns but no the columns which I want to expand pair-wise) but this did the trick and the running time is significantly lower now. – Maksym May 06 '15 at 19:39
6

At one point I adapted the code in expand.grid to make it easier to group clumps of columns. Here's the code

#available from
#https://gist.github.com/MrFlick/00e2c589a2fa4b6d91f2

Expand.Grid<-function (..., stringsAsFactors = TRUE) 
{
    nargs <- length(args <- list(...))
    if (!nargs) 
        return(as.data.frame(list()))
    if (nargs == 0L) 
        return(as.data.frame(list()))
    Names <- function(x) {if(!is.null(names(x))) names(x) else rep("",length(x))}
    Paste <- function(...) {a<-list(...); r<-do.call("paste", c(list(sep="."),
        a[sapply(a, function(x) !is.character(x) || any(nzchar(x)))]));
        nx <- max(sapply(a, length))
        if (length(r)) return(rep(r, length.out=nx)) else return(rep("", nx))
    }
    contribcols <- sapply(args, function(x) ifelse(class(x)=="data.frame", ncol(x), 1))
    outargs <- sum(contribcols)
    cargs <- vector("list", outargs)
    nmc <- paste0("Var", seq.int(sum(contribcols)))
    nm <- unlist(lapply(seq_along(args), function(x) if(class(args[[x]])=="data.frame") {
        Paste(Names(args)[x], Names(args[[x]])) } else {Names(args)[x]}))
    if (is.null(nm)) 
        nm <- nmc
    else if (any(ng0 <- !nzchar(nm))) 
        nm[ng0] <- nmc[ng0]
    names(cargs) <- make.unique(make.names(nm))
    rep.fac <- 1L
    d <- sapply(args, function(x) ifelse(class(x)=="data.frame", nrow(x), length(x)))
    orep <- prod(d)
    if (orep == 0L) {
        i<-1
        for (a in seq_along(args)) {
            if (contribcols[a]==1) {
                args[[a]]=list(a)
            }
            for(j in seq_len(contribcols[a])) {
                cargs[[i]] <- args[[a]][[j]][FALSE]
                i <- i+1
            }
        }
    } else {    
        i<-1
        for (a in seq_along(args)) {
            nx <- d[a]
            orep <- orep/nx
            x<-args[[a]]
            if (contribcols[a]==1) {
                x<-list(x)
            }
            for(j in seq_len(contribcols[a])) {
                y <- x[[j]]
                y <- y[rep.int(rep.int(seq_len(nx), rep.int(rep.fac, 
                    nx)), orep)]
                if (stringsAsFactors && !is.factor(y) && is.character(y)) 
                    y <- factor(y, levels = unique(y))
                cargs[[i]] <- y
                i <- i+1
            }
            rep.fac <- rep.fac * nx
        }
    }
    rn <- .set_row_names(as.integer(prod(d)))
    structure(cargs, class = "data.frame", row.names = rn)
}

And then you could use it like

aa<-read.table(text="  A B C D
1 1 x 3 5
2 2 y 4 6", header=T)

bb<-read.table(text="  E F  G  H
1 7 j  9 11
2 8 k 10 12", header=T)

Expand.Grid(aa,bb)
#   A B C D E F  G  H
# 1 1 x 3 5 7 j  9 11
# 2 2 y 4 6 7 j  9 11
# 3 1 x 3 5 8 k 10 12
# 4 2 y 4 6 8 k 10 12

It also allows for additional combinations not directly applicable to this question such as

#combine any number of data.frames and atomic vectors
Expand.Grid(aa,other=1:2, bb)
#give columns a prefix
Expand.Grid(x=aa,y=aa)
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Cool stuff. It would be neat if it disambiguated the column names like `data.frame` does. I mean, compare: `data.frame(aa,aa)` vs `Expand.Grid(aa,aa)` – Frank May 06 '15 at 19:42
  • 1
    Good suggestion Frank. I've thrown a `make.unique()` in there to clean up the names. – MrFlick May 06 '15 at 19:44
3

You can expand.grid the row numbers:

myg <- expand.grid(aa=1:nrow(aa),bb=1:nrow(bb))
cbind(aa[myg$aa,],bb[myg$bb,])

The row names in the result are a little ugly:

    A B C D E F  G  H
1   1 x 3 5 7 j  9 11
2   2 y 4 6 7 j  9 11
1.1 1 x 3 5 8 k 10 12
2.1 2 y 4 6 8 k 10 12
Frank
  • 66,179
  • 8
  • 96
  • 180
2

Copying the example from above, the crossing function from package tidyr will also do the trick now.

aa <- read.table(text = "  A B C D
             1 1 x 3 5
             2 2 y 4 6", header = T)

bb <- read.table(text = "  E F  G  H
             1 7 j  9 11
             2 8 k 10 12", header = T)


crossing(aa, bb)

Gives

Source: local data frame [4 x 8]

        A      B     C     D     E      F     G     H
      (int) (fctr) (int) (int) (int) (fctr) (int) (int)
  1     1      x     3     5     7      j     9    11
  2     1      x     3     5     8      k    10    12
  3     2      y     4     6     7      j     9    11
  4     2      y     4     6     8      k    10    12
Andrew
  • 733
  • 6
  • 11