0

I have df1 with 4 columns (let's call them a, b, c and d), and df2 with 2 columns (a and b). I'd like to add in df2 the columns that it lacks from df1 (so c and d) and fill them with NAs, in order to then merge the two. Normal R code would be the following (if I'm not mistaken) :

mdf <- plyr::rbind.fill(df1, df2)

But this doesn't work with SparkR's DataFrames : Error: All inputs to rbind.fill must be data.frames

How can I do that with functions that work on SparkR DataFrames ?

(Obviously, I'd like something maintainable, not something which is basically adding each column by hand like df2$c <-)

Thanks

(While I'm at it, names(df1) %in% names(df2) gives me [1] TRUE TRUE FALSE FALSE and which(names(dt1) %in% names(dt2)) gives me [1] 1 2, what function should I use to have it return the names of the columns, i.e. [1] a b ?)

François M.
  • 4,027
  • 11
  • 30
  • 81

1 Answers1

0

To answer the second part of your question. You can use dplyr::setdiff to find the columns that are not present on both data frames by name.

setting up the data frames (this also works with columns):

df1 <- c(LETTERS[1:4])
df2 <- c(LETTERS[3:4])

df1
[1] "A" "B" "C" "D"
df2
[1] "C" "D"

and to find the columns in that are not present in both data frames use

setdiff(df1, df2)
[1] "A" "B"

or to find the columns that are present in both data frames

intersect(df1, df2)
[1] "C" "D"