I have a large dataset (60.000+rows) that contains names. However, the format of writing the names down differs and to enhance data quality I need to recode the names in a single format. Instead of copy pasting the recode-command I would like to do this, for example, in a loop. I have a list of all the wrongly written names, and a list of all the corresponding correctly written names.
So basically, what I want to do is: take name 1 in list1 and replace with name 1 in list2, then take name 2 in list1 and replace with name 2 in list2 etc. Seems not much of a big deal using gsub? But...
I seem to get close, however the output is still not what I want. Does anyone know why or maybe have better solution than what i'm doing now?
EXAMPLE
> dput(list1)
c("Name1", "Name2", "Name3", "Name4", "Name5", "Name6", "Name7",
"Name8", "Name9", "Name10")
> dput(list2)
c("test1", "test2", "test3", "test4", "test5", "test6", "test7",
"test8", "test9", "test10")
I've added the print commands to see what is actually happening, it seems to work:
for (i in 1:length(list1)){
newlist <- gsub(paste0("\\<",list1[i], "\\>"), list2[i], list1)
print(i)
print(newlist[i])
}
[1] 1
[1] "test1"
[1] 2
[1] "test2"
[1] 3
[1] "test3"
[1] 4
[1] "test4"
[1] 5
[1] "test5"
[1] 6
[1] "test6"
[1] 7
[1] "test7"
[1] 8
[1] "test8"
[1] 9
[1] "test9"
[1] 10
[1] "test10"
But then when I ask what newlist would look like:
> newlist
[1] "Name1" "Name2" "Name3"
[4] "Name4" "Name5" "Name6"
[7] "Name7" "Name8" "Name9"
[10] "test10"
Also, I have tried using lapply and writing my own function... all didn't work out the way I wanted to :(