2

I am trying to subset an FFDF by a date. Below, I have successfully created such a subset using a normal data frame. But I needed some help in applying this to an FFDF. My attempt, along with the error message, is listed in the code comment. Many Thanks in advance!

#Create a normal data frame (in production this is read directly into an ffdf 
#through a csv file)

start  <- c("01/01/2010", "01/01/2011", "01/01/2012", "01/01/2012", "01/01/2012")
end  <- c("31/12/2010", "31/12/2011", "31/12/2012", "31/12/2012", "31/12/2012")
amount <- c(10,20,30,40,50)
df <- data.frame(start,end,amount)

#Ensure subsetting works on a normal data frame

  #convert type to proper date (this has to be done in production after csv file
  #has been read in)
  df$start <- as.Date(df$start, format="%d/%m/%Y")
  df$end <- as.Date(df$end, format="%d/%m/%Y")

  #Subset
  df <- subset(df, start == as.Date("2012-01-01",format="%Y-%m-%d"))

  #Works :) Now let's try with ffdf

ffdf <- as.ffdf(df)

  #Type conversion for dates (again, applied in production after mammoth csv has
  #been read in)
  ffdf$start <- as.Date(ffdf$start, format="%m/%d/%Y")
  ffdf$end <- as.Date(ffdf$end, format="%m/%d/%Y")

  #Subset
  ffdf <- subset.ff(ffdf, start==as.Date("2012-01-01",format="%Y-%m-%d"))
  #ERROR: Error in ffdf(x = x) : ffdf components must be atomic ff objects
Tyler Durden
  • 303
  • 5
  • 12

1 Answers1

2

Use subset.ffdf from package ffbase. Subset is a generic function in R, and ffbase implements it for ffdf objects. So you can just use subset as you would do with a regular data frame.

df <- data.frame(start=c("01/01/2010", "01/01/2011", "01/01/2012", "01/01/2012", "01/01/2012"),end=c("31/12/2010", "31/12/2011", "31/12/2012", "31/12/2012", "31/12/2012"),amount=c(10,20,30,40,50))
df$start <- as.Date(df$start, "%d/%m/%Y")
df$end<- as.Date(df$end, "%d/%m/%Y")

require(ffbase)
myffdf <- as.ffdf(df)
test <- subset(myffdf , start==as.Date("2012-01-01",format="%Y-%m-%d"))
test
  • thank you so much. I just tried to run your code but it returned the following error: Error in UseMethod("as.hi") : no applicable method for 'as.hi' applied to an object of class "NULL". I'm at a loss as to how to fix such a simple piece of code. It looks really promising though. Thank you again. – Tyler Durden Oct 17 '13 at 14:12
  • thanks for the example, it revealed a small issue in subset.ffdf, which is fixed now on the development version of ffbase. You can install it through library(devtools); install_github("ffbase", "edwindj", subdir="pkg"); This will fix your problem. –  Oct 17 '13 at 17:24
  • Thanks Edwin. That fix has worked. It required installing the fixed version of the ffbase package from Github as instructed above. I also had to upgrade my version of R to the latest version as devtools was not available for my old version of R. In addition, devtools had a dependency on libcurl which wasn't being installed automatically so I had to install that also with the command "sudo apt-get install libcurl4-openssl-dev" in the main Ubuntu terminal (not R console). – Tyler Durden Oct 18 '13 at 10:26