0

I'm using dplyr (actually dbplyr) and I have 2 tables. I've tried to reproduce samples below:

df1 <- tibble('y' = c(2001,2001,2001,2001,2002,2002,2002,2002), 'id' = c(7,8,7,8,7,8,7,8), 'c' = c(0,0,0,NA,NA,1,1,NA))

      y    id     c
  <dbl> <dbl> <dbl>
1  2001     7     0
2  2001     8     0
3  2001     7     0
4  2001     8    NA
5  2002     7    NA
6  2002     8     1
7  2002     7     1
8  2002     8    NA

and:

df2 <- tibble('y' = c(2001,2001,2002), 'id' = c(7,8,7), 'c' = c(1,1,1))
      y    id     c
  <dbl> <dbl> <dbl>
1  2001     7     1
2  2001     8     1
3  2002     7     1

What I want to do is merge the column c, so that I end up with df1, but with any relevant values, replaced in from df2.

      y    id     c
  <dbl> <dbl> <dbl>
1  2001     7     0
2  2001     8     0
3  2001     7     0
4  2001     8     1
5  2002     7     1
6  2002     8     1
7  2002     7     1
8  2002     8    NA

In this example I have only put one column, c, but in reality my data has hundreds of columns, and I would like to do this across all the columns.

I've seen this answer: Merge 2 columns into 1 column but its in Python, and only for 1 column.

I've tried using regular merges, but it doesn't work. I can do a merge, keeping both columns, then use a mutate with an ifelse statement, to check if the df2 column has a NA, and use the df2 column if it isn't, otherwise keep the df1 column. But this is only for 1 column, and I need this across multiple columns.

Any help would be appreciated

DataMonkey
  • 104
  • 7

0 Answers0