I have two data frames. One contains some towns' names and IDs, the other contains some towns' names and IDs plus an extra value:
library(tidyverse)
towns.df <- structure(list(town_id = c(1, 2, 3), town_name = c("Rome", "Madrid", "Sarajevo")),
row.names = c(NA, -3L),
class = "data.frame") %>% as_tibble()
values.df <- structure(list(town_id = c(1, 5, 4), town_name = c("Rome", "Sarajevo", "Madrid"), town_value = c(18, 11, 15)),
row.names = c(NA, -3L),
class = "data.frame") %>% as_tibble()
The data looks like this:
> towns.df
# A tibble: 3 × 2
town_id town_name
<dbl> <chr>
1 1 Rome
2 2 Madrid
3 3 Sarajevo
> values.df
# A tibble: 3 × 3
town_id town_name town_value
<dbl> <chr> <dbl>
1 1 Rome 18
2 5 Sarajevo 11
3 4 Madrid 15
Using a tidyverse
solution, I want to join the data frames based on IDs (for separate reasons I cannot directly do it based on towns' names) but the problem is that the IDs do not always correspond. The IDs that get priority are those found in towns.df
, e.g. if the same town has two different IDs in the two data frames, I want it to eventually be associated to the one from towns.df
.
So I first check what IDs from towns.df
are not present in values.df
by using anti_join()
:
> anti_join(towns.df, values.df, "town_id")
# A tibble: 2 × 2
town_id town_name
<dbl> <chr>
1 2 Madrid
2 3 Sarajevo
Then I want to take the corresponding towns and be able to modify values.df$town_id
accordingly, so that the ID would be the one displayed in towns.df
.
Or better, I want to directly create a joint data frame where I have town_id
, town_name
and town_value
as columns and with the correct town_id
.
The desired output is:
# A tibble: 3 × 3
town_id town_name town_value
<dbl> <chr> <dbl>
1 1 Rome 18
2 2 Madrid 15
3 3 Sarajevo 11
I am aware that in this example the desired output could be obtained by joining the data frames based on town_name
- i.e. this would do the trick:
left_join(towns.df, values.df[,2:3], "town_name")
but, as I said, this is something I am not able to do due to separate reasons that are exogenous to this question.
I checked this answer but I am not looking to using mutate
with a literal value, instead I need to use a different column (i.e. town_id
) than the one I am using to check the identity (i.e. town_name
).