1

I have two data frames. One contains some towns' names and IDs, the other contains some towns' names and IDs plus an extra value:

library(tidyverse)

towns.df <- structure(list(town_id = c(1, 2, 3), town_name = c("Rome", "Madrid", "Sarajevo")),
                      row.names = c(NA, -3L),
                      class = "data.frame") %>% as_tibble()

values.df <- structure(list(town_id = c(1, 5, 4), town_name = c("Rome", "Sarajevo", "Madrid"), town_value = c(18, 11, 15)),
                       row.names = c(NA, -3L),
                       class = "data.frame") %>% as_tibble()

The data looks like this:

> towns.df
# A tibble: 3 × 2
  town_id town_name
    <dbl> <chr>    
1       1 Rome     
2       2 Madrid   
3       3 Sarajevo

> values.df
# A tibble: 3 × 3
  town_id town_name town_value
    <dbl> <chr>          <dbl>
1       1 Rome              18
2       5 Sarajevo          11
3       4 Madrid            15

Using a tidyverse solution, I want to join the data frames based on IDs (for separate reasons I cannot directly do it based on towns' names) but the problem is that the IDs do not always correspond. The IDs that get priority are those found in towns.df, e.g. if the same town has two different IDs in the two data frames, I want it to eventually be associated to the one from towns.df.

So I first check what IDs from towns.df are not present in values.df by using anti_join():

> anti_join(towns.df, values.df, "town_id")
# A tibble: 2 × 2
  town_id town_name
    <dbl> <chr>    
1       2 Madrid   
2       3 Sarajevo 

Then I want to take the corresponding towns and be able to modify values.df$town_id accordingly, so that the ID would be the one displayed in towns.df.

Or better, I want to directly create a joint data frame where I have town_id, town_name and town_value as columns and with the correct town_id.

The desired output is:

# A tibble: 3 × 3
  town_id town_name town_value
    <dbl> <chr>          <dbl>
1       1 Rome              18
2       2 Madrid            15
3       3 Sarajevo          11

I am aware that in this example the desired output could be obtained by joining the data frames based on town_name - i.e. this would do the trick:

left_join(towns.df, values.df[,2:3], "town_name")

but, as I said, this is something I am not able to do due to separate reasons that are exogenous to this question.

I checked this answer but I am not looking to using mutate with a literal value, instead I need to use a different column (i.e. town_id) than the one I am using to check the identity (i.e. town_name).

user438383
  • 5,716
  • 8
  • 28
  • 43
Matteo
  • 2,774
  • 1
  • 6
  • 22

1 Answers1

1

We may use

library(powerjoin)
power_left_join(towns.df, values.df, by = "town_name",  conflict = \(x, y) x)
akrun
  • 874,273
  • 37
  • 540
  • 662