0

How can I perform a left_join using dplyr with two different keys and then, when the second key is not available, join the tables by using only the first key.

Thanks you

EDIT

Here is the logic:

If the two keys are found in the dataset, use both keys. If the second key is not found (NA or no matching) in the dataset, use only the first one as merging key.

John E.
  • 137
  • 2
  • 10
  • 1
    Hi John, it will be much easier to help if you provide at least a sample of your two data.frames with `dput(df1)` and `dput(df2)`, the code you've tried so far, and importantly your expected output. You can [edit] your question and paste the output. Please surround the output with three backticks (```) for better formatting. See [How to make a reproducible example](https://stackoverflow.com/questions/5963269/) for more info. – Ian Campbell Jul 16 '20 at 19:46

1 Answers1

1

You can try something along these lines:

library(tidyverse)

df1 <- tibble(
  key1 = c("A", "B"),
  key2 = c(1, 2),
  value_df1 = runif(2)
)

df2 <- tibble(
  key1 = c("A", "B"),
  key2 = c(1, NA),
  value_df2 = runif(2)
)

df_merged <- df1 %>%
  left_join(df2, by = c("key1", "key2")) %>%
  left_join(df2 %>% select(-key2), by = "key1") %>%
  mutate(value2 = coalesce(value_df2.x, value_df2.y)) %>%
  select(key1, key2, value_df1, value_df2)
Jakub.Novotny
  • 2,912
  • 2
  • 6
  • 21
  • Not exactly. If the two keys are inside the dataset, use both keys. If the second key is not in the dataset, use only the first one as merging key. I am not sure if your solution works – John E. Jul 17 '20 at 00:42