Identify missing values in an R data table

Question

I want to identify missing values in an R data datable

In order to get the id, column "id" of each column in your dataset

I use apply(is.na(dt_tb) 2, which) this scrip tells me the position, I would like to replace the position by the id number (id column)

dt_tb <- data.table(id = c(5, 6, 7, 15),
                 coll = c("this", NA,"NA", "text"),
                 cyy = c(TRUE, FALSE, TRUE, TRUE),
                 hhh = c(2.5, 4.2, 3.2, NA),
                 stringsAsFactors = FALSE)

apply(is.na(dt_tb), 2, which)

example $id integer(0)

$coll [1] 2

$cyy integer(0)

$hhh [1] 4

I want

id integer(0)

coll 6 7

cyy integer(0)

hhh 15

score 2 · Accepted Answer · answered Jun 16 '20 at 10:20

2

You can use unlist to get id from dt_tb$id and relist to come back to the origin structure.

i <- apply(is.na(dt_tb) | dt_tb=="NA", 2, which)
relist(dt_tb$id[unlist(i)], i)
#$id
#numeric(0)
#
#$coll
#[1] 6 7
#
#$cyy
#numeric(0)
#
#$hhh
#[1] 15

answered Jun 16 '20 at 10:20

GKi

37,245
2
26
48

Thanks how I can return a dataframe or datatable – Mathieu L Jun 17 '20 at 08:55
Have a look at: [Convert a list to a data frame](https://stackoverflow.com/q/4227223/10488504) – GKi Jun 17 '20 at 09:04

score 1 · Answer 2 · answered Jun 16 '20 at 11:07

You can use which with arr.ind = TRUE to get row and column index where NA or "NA" is present. You can then use split to get a named list.

mat <- which(is.na(dt_tb) | dt_tb == 'NA', arr.ind = TRUE)
split(dt_tb$id[mat[, 1]], names(dt_tb)[mat[, 2]])

#$coll
#[1] 6 7

#$hhh
#[1] 15

1k monkeys and a single PC · Answer 3 · 2020-06-16T11:03:34.473

you can use complete.cases(dt_tb)

install.packages("devtools")
install.packages("data.table")
library(devtools)
library(data.table)

dt_tb <- data.table(id = c(5, 6, 7, 15),
                    coll = c("this", NA,"NA", "text"),
                    cyy = c(TRUE, FALSE, TRUE, TRUE),
                    hhh = c(2.5, 4.2, 3.2, NA),
                    stringsAsFactors = FALSE)


complete.cases(dt_tb) # returns: TRUE FALSE  TRUE FALSE

which(!complete.cases(dt_tb)) # return row numbers: 2 4

dt_tb[!complete.cases(dt_tb),] # returns: rows with missing data/na's

update:

dt_tb[which(!complete.cases(dt_tb)),1] #to return ID's

id
1:  6
2: 15

Identify missing values in an R data table

3 Answers3