1

I want to identify missing values in an R data datable

In order to get the id, column "id" of each column in your dataset

I use apply(is.na(dt_tb) 2, which) this scrip tells me the position, I would like to replace the position by the id number (id column)

dt_tb <- data.table(id = c(5, 6, 7, 15),
                 coll = c("this", NA,"NA", "text"),
                 cyy = c(TRUE, FALSE, TRUE, TRUE),
                 hhh = c(2.5, 4.2, 3.2, NA),
                 stringsAsFactors = FALSE)

apply(is.na(dt_tb), 2, which)

example $id integer(0)

$coll [1] 2

$cyy integer(0)

$hhh [1] 4

I want

id integer(0)

coll 6 7

cyy integer(0)

hhh 15

Mathieu L
  • 73
  • 8

3 Answers3

2

You can use unlist to get id from dt_tb$id and relist to come back to the origin structure.

i <- apply(is.na(dt_tb) | dt_tb=="NA", 2, which)
relist(dt_tb$id[unlist(i)], i)
#$id
#numeric(0)
#
#$coll
#[1] 6 7
#
#$cyy
#numeric(0)
#
#$hhh
#[1] 15
GKi
  • 37,245
  • 2
  • 26
  • 48
1

You can use which with arr.ind = TRUE to get row and column index where NA or "NA" is present. You can then use split to get a named list.

mat <- which(is.na(dt_tb) | dt_tb == 'NA', arr.ind = TRUE)
split(dt_tb$id[mat[, 1]], names(dt_tb)[mat[, 2]])

#$coll
#[1] 6 7

#$hhh
#[1] 15
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
-1

you can use complete.cases(dt_tb)

install.packages("devtools")
install.packages("data.table")
library(devtools)
library(data.table)

dt_tb <- data.table(id = c(5, 6, 7, 15),
                    coll = c("this", NA,"NA", "text"),
                    cyy = c(TRUE, FALSE, TRUE, TRUE),
                    hhh = c(2.5, 4.2, 3.2, NA),
                    stringsAsFactors = FALSE)


complete.cases(dt_tb) # returns: TRUE FALSE  TRUE FALSE

which(!complete.cases(dt_tb)) # return row numbers: 2 4

dt_tb[!complete.cases(dt_tb),] # returns: rows with missing data/na's

update:

dt_tb[which(!complete.cases(dt_tb)),1] #to return ID's

id
1:  6
2: 15