Create list of all items in a data frame which can't be converted to number

Question

I have a large data set where I want to convert variables to numeric. The data has been entered in excel with some items given a name which corresponds to a number (so data is type character), but sometimes this has been mis-spelt, or unusual terminology used. I need to identify all items in the dataframe which are strings which could not be converted to numeric.

Here is MWE:

num = data.frame(var1 = c("2", "green", "5"),
                 var2 = c("blue","4",  "9"),
                 var3 = c("ble", "4", "1"),
                 var4 = c("5", "7", "big"))

In this example the output would be c("green","blue","ble", "big")

I will then convert these to the relevant number as follows:

num%>%
  mutate(across(contains("var"), ~ str_replace_all(., c("green" = "3", "blue|ble" = "3.5", "big" = "10"))))

before converting to numeric

You could try `unique(unlist(num)[is.na(as.numeric(unlist(num)))])` — Andrew Gustar, Oct 06 '22 at 10:19

s_baldur · Answer 1 · 2022-10-06T11:42:24.990

Unlist to create an atomic vector then select the elements with contain anything that is not a number.

grep("\\D", unlist(num, use.names = FALSE), value = TRUE)
# [1] "green" "blue"  "ble"   "big"

If you have decimal separators "." then the above will fail and smth like the following should be used

grep("^(\\d|\\.)+$", unlist(num, use.names = FALSE), value = TRUE, invert = TRUE)

Then you can create a lookup vector to replace them and convert to numeric:

lut = c("green" = 10, "blue" = 20, "ble" = 30, "big" = 40)
num[] <- lapply(num, \(x) as.numeric(ifelse(grepl("\\D", x), lut[x], x)))

#   var1 var2 var3 var4
# 1    2   20   30    5
# 2   10    4    4    7
# 3    5    9    1   40

Create list of all items in a data frame which can't be converted to number

1 Answers1