1

I have a data table A which has a column right_date. When I look at the column right_date, it seems there are 486 missing values in it. But when I drop the rows with missing values in the column right_date of A, with na.omit(A, cols="right_date") from the R documentation page,then 1,156 rows are dropped.

I do not know why this is happening. it seems DataCombine::DropNA(DT, Var="") is consistent with missing values just in right_date_vect , it drops 486 rows.

here is the data if you want to try https://drive.google.com/file/d/1diq9ctwen6jqfRFlV24qG8PKqdFcBqhu/view?usp=sharing

enter image description here

OverFlow Police
  • 861
  • 6
  • 23
  • Please provide a [reproducible minimal example](https://stackoverflow.com/q/5963269/8107362). Especially, provide code snippets and sample data in your question rather than pictures or links to other websites. – mnist Nov 22 '19 at 23:10
  • 2
    Does `is.data.table(A)` return TRUE? My guess is it is not, and `na.omit()` is a generic, so the cols is silently discarded if it is only a data.frame or matrix instead. – smingerson Nov 22 '19 at 23:17
  • @smingerson you are right! There should be an error or warning or something. – OverFlow Police Nov 23 '19 at 00:22

1 Answers1

1

The object "A" in the question was not, in fact, a data.table. na.omit() is a generic method, and extra arguments can get eaten by the dots. So, while no error was thrown, either na.omit.matrix() or na.omit.data.frame() was called instead, which would omit any row with an NA value.

This is a curse of the S3 system which can bite you. When I receive unexpected output, the first thing is I do is, for example, execute na.omit (no parentheses) at the console. This will print the function definition. If I see something like UseMethod("na.omit"), that indicates behavior differs by class, so then I check the class of my object.

The R package ellipsis is aimed at addressing this deficiency. Below is a way to stop this from happening again (based very much so on that readme!)

library(data.table)
library(ellipsis)
mat <- matrix(c(1, 2, 3, NA), nrow = 2)
colnames(mat) <- c("a", "b")
safe_na.omit <- function(object, ...) {
  check_dots_used()
  na.omit(object, ...)
}

safe_na.omit(mat, col = "a")
#> Error: 1 components of `...` were not used.
#> 
#> We detected these problematic arguments:
#> * `col`
#> 
#> Did you misspecify an argument?

dt <- as.data.table(mat)
safe_na.omit(dt, cols = "a")
#>    a  b
#> 1: 1  3
#> 2: 2 NA
smingerson
  • 1,368
  • 9
  • 12