0

I have been using the mapvaluesfunction in R to convert pre-defined character strings into their desired form. However some of these character strings are not unique, and what I would like to convert them to depends on a criteria in another column.

For example take the following dataframe:

df <- data.frame(Name = c("Audrey", "Belinda", "Caroline", "Caroline" "Dina", "Erica"),
             Country = c("China", "Germany", "England", "America", "India", "America"))

I would like to convert the American Caroline to 'Caz' using another data frame (where the desired form is specified):

dfmap <- data.frame(Name = c("Audrey", "Belinda", "Caroline", "Caroline", "Dina", "Erica"), 
                Country = c("China", "Germany", "England", "America", "India", "America"),
                NameCorrect = c("Audrey", "Belinda", "Caroline", "Caz", "Dina", "Erica"), 
                CountryCorrect = c("China", "Germany", "England", "America", "India", "America"))

I can't simply use the mapvalues function on the Name and NameCorrect columns as it wont be able to differentiate the English Caroline from the American Caroline.

How can I get R to differentiate between the American Caroline and English Caroline, and then map the values to the desired output stored in the dfmap dataframe?

Will T-E
  • 607
  • 1
  • 7
  • 16
  • 2
    Maybe merge by 2 columns? – zx8754 Aug 23 '16 at 12:13
  • 1
    Agree with zx8754, if that's not an option create a unique id with `paste(Name, Country)` – Nate Aug 23 '16 at 12:18
  • 1
    as mentioned by@NathanDay why don't you concatenate `Name` and `Country`, use this as a match key and use the `match` command on two columns ? – Shiva Prakash Aug 23 '16 at 12:22
  • @zx8754 If i merge, how do i then unmerge so I put them back in their correct columns? I'm just trying to change one column (Name), but using the second column (Country) to help identify the correct version to change it to. – Will T-E Aug 23 '16 at 13:58
  • I'm not sure how this question is a duplicate of the _"How to join (merge) data frames (inner, outer, left, right)?"_ question. I can see how that question can be used as part of a solution to this question, but I do not see how it is a complete solution. – Will T-E Aug 23 '16 at 14:00
  • @ShivaPrakash thanks, I've pasted `Name` and `Country` together and generated a `match` index for the relevant row in the map database, just can't get my head around how to use that index to change the name - any tips?! – Will T-E Aug 23 '16 at 14:51
  • It is a duplicate if we are trying to update values based on lookup table. first merge on 2 columns (Name and Country), update original columns (Name and Country) with new ones (NameCorrect and CountryCorrect), and drop merged lookup columns. No need for "unmerge", not sure what that means. – zx8754 Aug 23 '16 at 15:14
  • 1
    @WillT-E Add a new column as `df $corrected_name <- dfmap$NameCorrect[match(df$match_key, dfmap$match_key)]` This should work. I was about to paste an answer but already this question was marked duplicated hence couldn't post the answer. – Shiva Prakash Aug 24 '16 at 10:54

0 Answers0