Columns of DF2 that don't exist in DF1 %>% create them %>% fill them with NAs

Question

I have df1 with 4 columns (let's call them a, b, c and d), and df2 with 2 columns (a and b). I'd like to add in df2 the columns that it lacks from df1 (so c and d) and fill them with NAs, in order to then merge the two. Normal R code would be the following (if I'm not mistaken) :

mdf <- plyr::rbind.fill(df1, df2)

But this doesn't work with SparkR's DataFrames : Error: All inputs to rbind.fill must be data.frames

How can I do that with functions that work on SparkR DataFrames ?

(Obviously, I'd like something maintainable, not something which is basically adding each column by hand like df2$c <-)

Thanks

(While I'm at it, names(df1) %in% names(df2) gives me [1] TRUE TRUE FALSE FALSE and which(names(dt1) %in% names(dt2)) gives me [1] 1 2, what function should I use to have it return the names of the columns, i.e. [1] a b ?)

a [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) example may help. — Vincent Bonhomme, Jun 08 '16 at 12:22
According to the error message, I bet `df1` or `df2` is not of the class `data.frame`. What do you get for `class(df1)` and `class(df2)`? — Qaswed, Jun 08 '16 at 12:30

score 0 · Answer 1 · answered Dec 29 '16 at 14:11

To answer the second part of your question. You can use dplyr::setdiff to find the columns that are not present on both data frames by name.

setting up the data frames (this also works with columns):

df1 <- c(LETTERS[1:4])
df2 <- c(LETTERS[3:4])

df1
[1] "A" "B" "C" "D"
df2
[1] "C" "D"

and to find the columns in that are not present in both data frames use

setdiff(df1, df2)
[1] "A" "B"

or to find the columns that are present in both data frames

intersect(df1, df2)
[1] "C" "D"

Columns of DF2 that don't exist in DF1 %>% create them %>% fill them with NAs

1 Answers1