Summarize missing values per column in a simple table with data.table

Asked Feb 11 '16 at 19:08

Active Feb 11 '16 at 19:14

Viewed 35 times

I want to calculate the number of missing values in a data.table.

require(data.table)
DT <- data.table(kkey = 1:10, data = (1:10)^2)
for (i in 1:2) set(DT, sample(10, i+3), i, NA)
> DT
    kkey data
 1:    1   NA
 2:   NA    4
 3:    3   NA
 4:    4   16
 5:   NA   NA
 6:    6   36
 7:   NA   NA
 8:    8   NA
 9:    9   81
10:   NA  100

I can get the information I want with:

DT[, c('missing.values.in.kkey', 'missing.values.in.data') := 
     lapply(.SD, function(x) sum(is.na(x)))]

summary(DT)

lapply(DT, function(x) sum(is.na(x)))
$kkey
[1] 4

$data
[1] 5

But how do I create a simple data.table like this, the data.table way?:

      missing.kkey  missing.data
1:               4             5

edited Feb 11 '16 at 19:14

MichaelChirico

33,841
14
113
198

asked Feb 11 '16 at 19:08

Konstantinos

4,096
3
19
28

2

Instead of using `:=`, just keep the result in a `list(..)`. What happens with `:=` is that it assigns two new columns in the original dataset, which is not what you wanted. – akrun Feb 11 '16 at 19:11
2

I find `colSums(is.na(DT))` handy for such things – talat Feb 11 '16 at 19:13
2

What @akrun said... just remove everything in `j` to the left of (and including) `:=` (and mix with `setNames` if you're married to how the output is named – MichaelChirico Feb 11 '16 at 19:13
Thank you guys! `missing.DT <- DT[, lapply(.SD, function(x) sum(is.na(x)))]` and then `setnames(missing.DT, paste0('missing.', names(DT)))` produces exactly what I wanted. I didn't know about `colSums` which is much more simpler to use, though! Thank you! – Konstantinos Feb 11 '16 at 19:22

Summarize missing values per column in a simple table with data.table

0 Answers0