0

This must surely have been asked before, and I apologize if this is a duplicate, but I did not find a previous relevant question.

Suppose I have, as a column in a data.table read from a file, a vector of strings, with each string containing either a number (e.g. "3.14"), or indicating that no number was observed (e.g. "not observed"). I would like to convert this column to numeric, and replace entries that say "not observed" with a default, such as 10 (n.b. - this is an example, I'd like a solution that also works for any other numeric value).

If I say

# Toy data
data <- data.table(Column = c("3.14", "2.718", "not observed"))

data[, Column := fifelse(Column == "not observed", 10, as.numeric(Column))]

I get the correct result, but I also get a warning saying NAs introduced by coercion, presumably because as.numeric(Column) is evaluated for the entire column, including the cells containing the string "not observed", before fifelse() ("fast if/else", from the data.table package) runs.

The warning's harmless but ugly, and I'd like to get rid of it. I'm no fan of suppressing warnings however, and would like to find an alternate way of doing the same thing. Is there any? I'd ideally like to keep using data.table syntax, e.g. :=, since my real data.table is (much) taller, and execution times matter.

Thank you!

BestGirl
  • 319
  • 1
  • 13
  • Does this anwer your question? How to avoid warning when introducing NAs by coercion https://stackoverflow.com/questions/14984989/how-to-avoid-warning-when-introducing-nas-by-coercion – M Aurélio Feb 15 '23 at 12:23
  • Something like this? `data[data$Column == "not observed"] <- 10; data$Column <- as.numeric(data$Column)`? – jpsmith Feb 15 '23 at 12:31

2 Answers2

3
data[, Column := as.numeric(fifelse(Column == "not observed", "10", Column))]
Nir Graham
  • 2,567
  • 2
  • 6
  • 10
2

as.numeric.nowarn accepts a character vector. It will suppress conversion warnings but it will warn for any other warning. It will use the default value specified (NA if not specified) for the components that cannot be converted to numeric.

as.numeric.nowarn <- function(x, default = NA) {
  y <- withCallingHandlers(as.numeric(x), warning = 
    function(w) if (grepl("NAs introduced by coercion", w)) 
      invokeRestart("muffleWarning") else w)
  replace(y, is.na(y) & !is.na(x), default)  
}

library(data.table)
data <- data.table(Column = c("3.14", "2.718", "not observed")) # from question

data[, Column := as.numeric.nowarn(Column, 10)]

data
##    Column
## 1:  3.140
## 2:  2.718
## 3: 10.000
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341