I have a large data set where I want to convert variables to numeric. The data has been entered in excel with some items given a name which corresponds to a number (so data is type character
), but sometimes this has been mis-spelt, or unusual terminology used. I need to identify all items in the dataframe which are strings which could not be converted to numeric.
Here is MWE:
num = data.frame(var1 = c("2", "green", "5"),
var2 = c("blue","4", "9"),
var3 = c("ble", "4", "1"),
var4 = c("5", "7", "big"))
In this example the output would be c("green","blue","ble", "big")
I will then convert these to the relevant number as follows:
num%>%
mutate(across(contains("var"), ~ str_replace_all(., c("green" = "3", "blue|ble" = "3.5", "big" = "10"))))
before converting to numeric