How to join two data frames without introducing NAs and duplicate columns?

Question

My data frame has some missing values for certain variables. I have put together another data frame that has those missing values. When I try to combine them, it introduces NA values with duplicate columns. I have tried all four joins (left, right, full, inner, anti), but nothing is working. I'm missing some command in left_join, but not sure which one.How do I join these?

dat_nc <- left_join(dat_NC, dat_missing, by = c("year", "location", "spread_event"))

I am expecting to replace missing values with the values from another data frame.

As no data for the two datasets is available, I have to resort to guessing: are you sure, the columns you are joining by are actually structually the same (datatype etc.)? — Dom42, Jan 04 '23 at 00:55
I don't see any common columns between the 2 data frames. How do they relate? Assume row 1 from table 1 goes with row 1 of table 2? — Dave2e, Jan 04 '23 at 00:58
`*_join` by itself will duplicate your columns rather than replacing the `NA` values. Have a look here for a possible solution: https://stackoverflow.com/questions/39110179/can-i-replace-nas-when-joining-two-data-frames-with-dplyr — Stewart Macdonald, Jan 04 '23 at 01:04
@Dome42 yes, they are the same type. A few values in certain rows are missing. Column titles are same in both data sets. — Ahsk, Jan 04 '23 at 01:06
@Dave2e location, year, and spread_event are in both data sets. I just wanted to show missing data that's why you can't see the whole data sets, which is rather large — Ahsk, Jan 04 '23 at 01:08
If a row has NA for one of the key columns, what result do you want? Should it only appear in the output and be linked if if it NA in both tables? In that case you might convert the NA to "NA" so that it is not linked to all values. Where you are trying to "replace missing values from other data frame," are the available columns distinct enough to uniquely link? — Jon Spring, Jan 04 '23 at 01:34
Does this answer your question? [Can I replace NAs when joining two data frames with dplyr?](https://stackoverflow.com/questions/39110179/can-i-replace-nas-when-joining-two-data-frames-with-dplyr) — John Polo, Jan 04 '23 at 01:58
@JonSpring I'm trying to replace "replace missing values from other data frame,". Only column names are overlapping - values are distinct. — Ahsk, Jan 04 '23 at 12:17
Please do not post (only) an image of code/data/errors: it breaks screen-readers and it cannot be copied or searched (ref: https://meta.stackoverflow.com/a/285557 and https://xkcd.com/2116/). Please include the code, console output, or data (e.g., `data.frame(...)` or the output from `dput(head(x))`) directly. — r2evans, Jan 04 '23 at 13:10
@JohnPolo Yes, thank you very much. I just needed to use distinct() function to get rid of duplicate rows after joining. If you could answer the question please, I will accept your answer to close the question. Please mentioned distinct() function in your answer too. Thanks again — Ahsk, Jan 04 '23 at 13:28
@JohnPolo This was the full code used in the solution dat_nc <- left_join(dat_NC, dat_missing, by = c("year", "location", "spread_event")) %>% mutate(mean_ws = coalesce(mean_ws.x, mean_ws.y)) %>% select(-mean_ws.x, -mean_ws.y) %>% mutate(mean_temp = coalesce(mean_temp.x, mean_temp.y)) %>% select(-mean_temp.x, -mean_temp.y) %>% mutate(mean_rh = coalesce(mean_rh.x, mean_rh.y)) %>% select(-mean_rh.x, -mean_rh.y) %>% mutate(mean_wd = coalesce(mean_wd.x, mean_wd.y)) %>% select(-mean_wd.x, -mean_wd.y) %>% distinct() — Ahsk, Jan 04 '23 at 13:43
If the code in the comment works, you can write the answer yourself. I don't feel like I should take code you wrote to write an answer that gives me credit. Don't forget to click "Accept" on the answer after you have written the answer. — John Polo, Jan 04 '23 at 14:58
I have answered the question. Stackoverflow will let me accept it tomorrow. — Ahsk, Jan 04 '23 at 15:17

Ahsk · Accepted Answer · 2023-01-04T15:08:11.673

I needed to use coalesce() function after joining to "replace NA values with the data from the second data frame," and then distinct function to get rid of duplicate values introduced by coalesce() function.

dat_nc <-
  left_join(dat_NC, dat_missing, by = c("year", "location", "spread_event")) %>%
  mutate(mean_ws = coalesce(mean_ws.x, mean_ws.y)) %>%
  select(-mean_ws.x, -mean_ws.y) %>%
  mutate(mean_temp = coalesce(mean_temp.x, mean_temp.y)) %>%
  select(-mean_temp.x, -mean_temp.y) %>%
  mutate(mean_rh = coalesce(mean_rh.x, mean_rh.y)) %>%
  select(-mean_rh.x, -mean_rh.y) %>%
  mutate(mean_wd = coalesce(mean_wd.x, mean_wd.y)) %>%
  select(-mean_wd.x, -mean_wd.y) %>%
  mutate(lwd_duration = coalesce(lwd_duration.x, lwd_duration.y)) %>%
  select(-lwd_duration.x, -lwd_duration.y) %>%
  distinct()

How to join two data frames without introducing NAs and duplicate columns?

1 Answers1