The problem:
Let us consider a dataframe df
:
df <- structure(list(id = 1:4, var1 = c("blissard", "Blizzard", "storm of snow",
"DUST DEVIL/BLIZZARD")), .Names = c("id", "var1"), class = "data.frame", row.names = c(NA,
-4L))
> df
id var1
1 "blissard"
2 "Blizzard"
3 "storm of snow"
4 "DUST DEVIL/BLIZZARD"
> class(dt$var1)
[1] "character"
I would like to make it tidy and pretty, hence I try to recode var1
, that possesses four different entries in a more gracious and analysable va1_recoded
, hence:
df$var1_recoded[grepl("[Bb][Ll][Ii]", df$var1)] <- "blizzard"
df$var1_recoded[grepl("[Ss][Tt][Oo]", df$var1)] <- "storm"
id var1 var1_recoded
1 "blissard" "blizzard"
2 "Blizzard" "blizzard"
3 "storm of snow" "storm"
4 "DUST DEVIL/BLIZZARD" "blizzard"
The question:
How can I create a function that automates the process described by the two functions above? In different words: how would that be generalizable to (lets say) 1000 replacements?
I would input the function with a list (such as c("storm", "blizzard")
) and then make it apply
the process of matching and replacing the observations that respect the condition.
I found a precious contribute here: Replace multiple arguments with gsub
but I am not able to programmatically translate the function described above in the R language. Especially, I cannot create the condition allowing grep
to recognize the first three letters of the word to match.