My problem: I am called on to compare pesticide lists that can be anywhere from 100 to 500 compounds long. I have no problem importing and spreading them but, if the names do not match, the columns do not align. Naming is creative sport in the pesticide world. Endosulfan or Endosulphan. op-DDT or DDT (o,p).
My view was that if I created a dictionary in Excel where the first column held a preferred name (pref) and the columns to the right held alternatives (up to five) I could run through the pesticide list to standardise the naming before spreading it and then get alignment.
I tried creating a string of the alternatives, omitting the empty fields and then using sub to do the replacement.
For example, I set my preferred name as
pref <- "HCH-gamma (Lindane)"
and a string of alternatives as
check_list <- "BHC-gamma (Lindane)|BHC - gamma (Lindane)|Lindane"
and then ran a loop through a df of names with
Combined$Name[i] <- sub(check_list, pref, Combined$Name[i])
What started out as name <- c("HCH-gamma (Lindane)","BHC-gamma (Lindane)","BHC - gamma (lindane)","Lindane")
should end up as
name <- c("HCH-gamma (Lindane)","HCH-gamma (Lindane)","HCH-gamma (Lindane)","HCH-gamma (Lindane"))
But didn't. The results were weird, such as
"BHC - gamma (HCH-gamma (Lindane))";
Clearly I do not have the grammar correct but it is the first time I have tried string manipulation like this and cannot fathom out what I am doing wrong. Any guidance would be appreciated. Or is there a better way to do it?