2

Suppose two ffdf files:

library(ff)
ff1 <- as.ffdf(data.frame(matrix(rnorm(10*10),ncol=10)))
ff2 <- ff1
colnames(ff2) <- 1:10

How can I column bind these without loading them into memory? cbind doesn't work.

There is the same question http://stackoverflow.com/questions/18355686/columnbind-ff-data-frames-in-r but it does not have an MWE and the author abandoned it so I reposted.

user2763361
  • 3,789
  • 11
  • 45
  • 81
  • combining without loading into memory...? What exactly would that look like? – Ricardo Saporta Dec 16 '13 at 03:15
  • @RicardoSaporta I don't know. I can do a buttload of other things with ff objects without loading the original full data frame into memory, so I thought `cbind` may be possible too. – user2763361 Dec 16 '13 at 03:16

2 Answers2

3

You can use the following construct cbind.ffdf2, making sure the column names of the two input ffdf's are not duplicate:

library(ff)
ff1 <- as.ffdf(data.frame(letA = letters[1:5], numA = 1:5))
ff2 <- as.ffdf(data.frame(letB = letters[6:10], numB = 6:10))

cbind.ffdf2 <- function(d1, d2){
  D1names <- colnames(d1)
  D2names <- colnames(d2)
  mergeCall <- do.call("ffdf", c(physical(d1), physical(d2)))
  colnames(mergeCall) <- c(D1names, D2names)
  mergeCall
}

cbind.ffdf2(ff1, ff2)[,]

Result:

   letA numA letB numB
1   a    1    f     6
2   b    2    g     7
3   c    3    h     8
4   d    4    i     9
5   e    5    j    10
jbaums
  • 27,115
  • 5
  • 79
  • 119
Audrey
  • 212
  • 4
  • 15
  • 1
    Elegant solution. If you request this to be incorporated in ffbase, do post a feature request at https://github.com/edwindj/ffbase/issues –  Mar 07 '14 at 11:37
  • Thanks. Cannot seem to find a 'new feature' category. Enhancement? – Audrey Mar 07 '14 at 12:11
  • Click 'new issue' and flag it as enhancement –  Mar 07 '14 at 12:13
  • Raised it. Finding out how to flag it with any label proved beyond my intuition (this or github's UX for noobs is bad), so it stayed un-labeled :-) – Audrey Mar 07 '14 at 13:10
2

Sorry for joining this late.If you want to cbind an arbitrary number of ffdf objects without worrying of duplicate columns. You can try this (building on Audrey's solution).

ff1 <- as.ffdf(data.frame(letA = letters[1:5], numA = 1:5))
ff2 <- as.ffdf(data.frame(letA = letters[6:10], numB = 6:10))

cbind.ffdf2 <- function(...){
  argl <- list(...)
  if(length(argl) == 1L){
    return(argl[[1]])
  }else{
    physicalList = NULL
    for(i in 1:length(argl)){
      if(class(argl[[i]]) == "data.frame"){
        physicalList = c(physicalList, physical(as.ffdf(argl[[i]])))
      }else{
        physicalList = c(physicalList, physical(argl[[i]]))
      }

    }
    mergeCall <- do.call("ffdf", physicalList)
    return(mergeCall)
  }

}

cbind.ffdf2(ff1, ff2)

It also coarses any data frame object in the list to an ffdf object.