1

I'm comparing and replacing (missing) values as a part of my pipeline. Missing values are marked in my data.table as -9.

Is there any danger in using -9L in comparing?

> x <- -9
> typeof(x)
[1] "double"
>
> y <- -9L
> typeof(y)
[1] "integer"

Example:

dfmelt[value == -9L, code := paste0("0", "0")]  

versus:

dfmelt[value == -9, code := paste0("0", "0")]   
Bas
  • 1,066
  • 1
  • 10
  • 28
  • 1
    If the type of `value` is integer, this doesn't matter. If it is an assigned double, it doesn't matter either. If it is a double and a result of calculations, you shouldn't be using `==` anyway (due to floating point number precision). And of course, encoding `NA` values as a number is a sign of inferior software design. – Roland Apr 28 '16 at 08:03
  • 1
    For further reading on the above comment and floating point error, [this is worth a glance](http://www.burns-stat.com/pages/Tutor/R_inferno.pdf). – alistaire Apr 28 '16 at 08:07
  • @Roland Thank you, can you write your comment down as answer? And unfortunately we work alot with ASCII files here. They come along with headaches! and thanks Alistaire, I've actually read a part of that book already, maybe I should continue reading:) – Bas Apr 28 '16 at 08:16

1 Answers1

1

If the type of value is integer, this doesn't matter. If it is an assigned double, it doesn't matter either. If it is a double and a result of calculations, you shouldn't be using == anyway (due to floating point number precision). And of course, encoding NA values as a number is a sign of inferior software design.

I suggest to convert these values to NA during import:

read.table(text = "1,2,-9", sep = ",", na.strings = "-9")
#  V1 V2 V3
#1  1  2 NA

Then you can use is.na and avoid this problem.

Community
  • 1
  • 1
Roland
  • 127,288
  • 10
  • 191
  • 288