2

I have gone through many threads about it but they don't seem to have the solution I really need. Just wanted to comment on the differences of my question from suggested posts:

I have two data frames:

> main <- data.frame(V1 = factor(c("A","B","C","C","D","E","A")))
> main
V1
 A
 B
 C
 C
 D
 E
 A

> lookup <- data.frame(V1=c("A","B","C"),V2=c("aa","bb","cc"))
> lookup
V1 V2
 A aa
 B bb
 C cc

What I need is to use lookup to update main but leave the unmatched ones as is. Many of the answers involved using match but it created an NA for unmatched levels. For example, one of the solution was:

> main$V1=lookup[match(main$V1,lookup$V1),"V2"]
> main
   V1
   aa
   bb
   cc
   cc
 <NA>
 <NA>
   aa

The desired outcome is to leave the unmatched ones unchanged:

V1
aa 
bb
cc
cc
 D
 E
aa

That was an example and my real datasets were a lot bigger so replace them one by one was really not an option. Any help or pointer will be greatly appreciated. Thanks much!

H.Hung
  • 127
  • 7
  • 1
    See [this answer](https://stackoverflow.com/a/49195435/680068) in linked post. This should work: `library(tidyverse); main %>% mutate(V1 = fct_recode(V1,"aa" = "A","bb" = "B","cc" = "C"))` – zx8754 Jan 12 '21 at 08:01
  • Thanks, but my actual datasets were a lot bigger so replace them one by one was really not an option... – H.Hung Jan 12 '21 at 20:18

1 Answers1

2

You can use the match output to conditionally change values.

inds <- match(main$V1, lookup$V1)
main$V1[!is.na(inds)] <- lookup$V2[na.omit(inds)]
main

#  V1
#1 aa
#2 bb
#3 cc
#4 cc
#5  D
#6  E
#7 aa

You can also use the join approach :

library(dplyr)
main %>%
  left_join(lookup, by = 'V1') %>%
  transmute(V1 = coalesce(V2, V1))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks much. So it seems like your solution will work if the variable is ```character``` but it does not work for ```factor```. Is there anyway to do it without changing the data format? – H.Hung Jan 12 '21 at 07:11
  • 1
    For `factors` you need to include all its levels in the column. Run `levels(main$V1) <- union(levels(main$V1), levels(lookup$V2))` and then run the above answer. – Ronak Shah Jan 12 '21 at 07:16