Replace all nicknames with full names based on a different dataframe in R

Question

I have a dataframe with names that are a mixture of full names and nicknames. I want to replace all of the nicknames in that dataframe with full names from a different dataset.

temp <- data.frame("Id" = c(1,2,3,4,5), "name" = c("abe", "bob", "tim","timothy", "Joe"))
temp2 <-data.frame("name" = c("abraham", "robert", "timothy","joseph"),"nickname1"=c("abe", "rob", "tim","joe"),"nickname2"=c("", "bob", "","joey"))

If the name column in temp appears in either nickname1 or nickname2 in temp2, replace with the value in the name column of temp2.

so it should look like this at the end:

 temp3<- data.frame("Id" = c(1,2,3,4,5), "name" = c("abraham", "robert", "timothy","timothy", "Joseph"))

You want to make your `temp2` a 'long' file, instead of having `nickname1`, `nickname2` etc. you would just have `name` and `nickname` with repeated rows of `name`. Then it will be a merge/join operation as per https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right — thelatemail, Mar 16 '20 at 01:30

score 3 · Accepted Answer · answered Mar 16 '20 at 01:38

As mentioned by @thelatemail, you can get the data in long format and then do a join. Also, you have data in upper as well as lower case, make it into one uniform case before doing the join. If the value is present in temp2, you can select that or else keep the temp value using coalesce.

library(dplyr)

temp2 %>% 
  tidyr::pivot_longer(cols = -name, names_to = 'nickname') %>%
  filter(value != '') %>%
  mutate(name = tolower(name)) %>%
  right_join(temp %>% mutate(name = tolower(name)), by = c('value' = 'name')) %>%
  mutate(name = coalesce(name, value)) %>%
  select(Id, name)

#     Id name   
#  <dbl> <chr>  
#1     1 abraham
#2     2 robert 
#3     3 timothy
#4     4 timothy
#5     5 joseph

Replace all nicknames with full names based on a different dataframe in R

1 Answers1