1

How can I check whether an element of a character vector can be converted to numeric or not? To be more precise, when the element is a float or an integer it can be converted to numeric without any problems, but when it is a string the warning: “NAs introduced by coercion” occurs. I was able to indirectly check by the index of the NA value. However, it would be much cleaner to be able to do this without getting a warning.

cat1 <- c("1.12354","1.4548","1.9856","some_string")
cat2 <- c("1.45678","1.1478","1.9565","1.32315")
target <- c(0,1,1,0)
df <- data.frame(cat1, cat2, target)
catCols <- c("cat1", "cat2")

for(col in catCols){
a <- as.numeric(unique(df[[col]]))
if(length(which(is.na(a))) != 0){
print(col)
print(which(is.na(a)))
 }
}

Mine
  • 831
  • 1
  • 8
  • 27
  • 1
    What is your goal? The `as.numeric` function warns you if you have some non-coercible numbers. You just want to suppress the warnings? – nicola May 14 '21 at 09:49
  • Does this answer your question? [Test for numeric elements in a character string](https://stackoverflow.com/questions/13638377/test-for-numeric-elements-in-a-character-string) – slamballais May 14 '21 at 09:49
  • 1
    You either convert the entire vector to numeric, or you leave it as string. What do you want to achieve here? – Tim Biegeleisen May 14 '21 at 09:50
  • @TimBiegeleisen I want to determine the element and the column this warning occurs at – Mine May 14 '21 at 09:58
  • @nicola The goal is to determine the value and the column this warning occurs at but preferably without getting the warning. Using another method – Mine May 14 '21 at 10:00

2 Answers2

4

Perhaps, you can use regex to find if all the values in a column are either an integer or float.

can_convert_to_numeric <- function(x) {
  all(grepl('^(?=.)([+-]?([0-9]*)(\\.([0-9]+))?)$', x, perl = TRUE))  
}

sapply(df[catCols], can_convert_to_numeric)
# cat1  cat2 
#FALSE  TRUE 

Alternatively, to get values that cannot be converted to numeric we can use grep as :

values_which_cannot_be_numeric <- function(x) {
  grep('^(?=.)([+-]?([0-9]*)(\\.([0-9]+))?)$', x, perl = TRUE, invert = TRUE, value = TRUE)
}

lapply(df[catCols], values_which_cannot_be_numeric)

#$cat1
#[1] "some_string"

#$cat2
#character(0)

Regex taken from here.


If you use type.convert you don't have to worry about this at all.

df <- type.convert(df, as.is = TRUE)
str(df)

#'data.frame':  4 obs. of  3 variables:
# $ cat1  : chr  "1.12354" "1.4548" "1.9856" "some_string"
# $ cat2  : num  1.46 1.15 1.96 1.32
# $ target: int  0 1 1 0
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
3

A solution is to write a function returning the indices of the NA values to be applied to the columns you want.

check_num <- function(x){
  y <- suppressWarnings(as.numeric(x))
  if(anyNA(y)){
    which(is.na(y))
  } else invisible(NULL)
}
lapply(df[catCols], check_num)
#$cat1
#[1] 4
#
#$cat2
#NULL

The function above returns NULL if all values can be converted to numeric. This next function follows the same method of determining which vector elements can be converted but returns integer(0) if all can be converted.

check_num2 <- function(x){
  y <- suppressWarnings(as.numeric(x))
  which(is.na(y))
}
lapply(df[catCols], check_num2)
#$cat1
#[1] 4
#
#$cat2
#integer(0)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66