I am trying to left-join df2
onto df1
.
df1
is my dataframe of interest, df2
contains additional information I need.
Example:
#df of interest onto which the other should be joined
key1 <- c("London", "Paris", "Berlin", "Delhi")
other_stuff <- c("Tea", "Coffee", "Beer", "Tea")
df1 <- data.frame(key1, other_stuff)
#additional info df
key2 <- c("London and other cities", "some other city", "Eastberlin is history", "Berlin", "Delia is a name", "Delhi is a place")
more_info <- c("history", "languages", "trades", "art", "commerce", "manufacturing")
df2 <- data.frame(key2,more_info)
What I now want is that df2$key2
is searched for the precise occurrence of df1$key1
and then merged onto df1
(e.g. match Berlin to Berlin, but not to Eastberlin, and Delhi to Delhi but not to Delia) while ignoring the other words surrounding the match.
Desired outcome:
key1 | other_stuff | more_info |
---|---|---|
London | Tea | history |
Paris | Coffee | NA |
Berlin | Beer | art |
Delhi | Tea | manufacturing |
I tried variations of regex_left_join
joined<- regex_left_join(df1,df2, by = c("key1" = "key2"), ignore_case= F)
and fuzzyjoins
joined<- df1%>% fuzzy_left_join(df2, by = c("key1" = "key2"), match_fun = str_detect)
They both only give a result for the exact match (key1=key2=Berlin) and give NA for everything else.
How do I do this?
I also tried Merging two tables where one column is substring of the other in R but the logic in the SQL there is the wrong way around. I tried several other Stackexchange approaches, but they are "too fuzzy" for my data.