1

I need an outer join of ffdf dataframes saved in a list. Have checked this, but it refers to a different problem. Example code for RAM objects:

x1 = data.frame(name='a1', Ai=2, Ac=1, Bi=1)
x2 = data.frame(name='a2', Ai=1, Bi=3, Bc=1, Ci=1)
x3 = data.frame(name='a3', Ai=3, Ac=2, Bi=2, Ci=3, Cc=1, Di=2, Dc=2)
x4 = data.frame(name='a4', Ai=3, Bi=2, Ci=1, Fi=2)
dl = list(x1,x2,x3,x4)
mergedDF = Reduce(function(...) merge(..., all=T), dl)
mergedDF[is.na(merged.data.frame)] = 0

Desired result looks like:

mergedDF
  name Ai Bi Ci Ac Bc Cc Di Dc Fi
1   a1  2  1  0  1  0  0  0  0  0
2   a2  1  3  1  0  1  0  0  0  0
3   a3  3  2  3  2  0  1  2  2  0
4   a4  3  2  1  0  0  0  0  0  2

As long as I turn the data frames to ffdf though, I get the error

Error in merge.ffdf(..., all = T) : merge.ffdf only allows inner joins

Any known workrounds? Many thanks in advance.

Community
  • 1
  • 1
Audrey
  • 212
  • 4
  • 15
  • If I understand your question correctly. The development version of ffbase contains a function called `ffdfrbind.fill` (similar as rbind.fill). `library(devtools); install_github("edwindj/ffbase", subdir="pkg")` will install that development version. Normally ffdfrbind.fill(x1, x2, x3, x4) will get you there. –  Mar 04 '14 at 08:45
  • rbind.fill functionality is what is needed indeed. Unfortunately I get this error when I try `install_github("edwindj/ffbase", subdir="pkg")` : `ERROR: compilation failed for package 'ffbase'` – Audrey Mar 04 '14 at 10:35
  • I believe you are working on windows. If you want to install the package from source as is done with install_github, you need to have Rtools installed. Do you have Rtools installed? http://cran.r-project.org/bin/windows/Rtools/ –  Mar 04 '14 at 10:38
  • `Warning message: package ‘Rtools’ is not available (for R version 3.0.2)` – Audrey Mar 04 '14 at 11:07
  • Is Rtools in your path. Maybe you need to restart your computer before it is in your path? –  Mar 04 '14 at 12:14
  • Managed to install it, still function `ffdfrbind.fill` is missing :-( – Audrey Mar 04 '14 at 12:41
  • A right, it is not in the namespace yet. You can access it for the time being as `ffbase:::ffdfrbind.fill`. At the next releas of ffbase, you can remove the `ffbase:::` part –  Mar 04 '14 at 12:44
  • Awesome as usual. Problem solved. – Audrey Mar 04 '14 at 12:48

1 Answers1

1

This post helped me Combine two data frames by rows (rbind) when they have different sets of columns. So to do a similar thing with yours:

   install.packages('plyr')
   require(plyr)
   answer <- Reduce(rbind.fill,dl)
   answer[is.na(answer)] <- 0
   answer

  name Ai Ac Bi Bc Ci Cc Di Dc Fi
1   a1  2  1  1  0  0  0  0  0  0
2   a2  1  0  3  1  1  0  0  0  0
3   a3  3  2  2  0  3  1  2  2  0
4   a4  3  0  2  0  1  0  0  0  2

BTW nice thought with Reduce, that's a nifty little function that rarely (at least for me) gets used.

Community
  • 1
  • 1
James Tobin
  • 3,070
  • 19
  • 35