0

I want to calculate the number of missing values in a data.table.

require(data.table)
DT <- data.table(kkey = 1:10, data = (1:10)^2)
for (i in 1:2) set(DT, sample(10, i+3), i, NA)
> DT
    kkey data
 1:    1   NA
 2:   NA    4
 3:    3   NA
 4:    4   16
 5:   NA   NA
 6:    6   36
 7:   NA   NA
 8:    8   NA
 9:    9   81
10:   NA  100

I can get the information I want with:

DT[, c('missing.values.in.kkey', 'missing.values.in.data') := 
     lapply(.SD, function(x) sum(is.na(x)))]

or

summary(DT)

or

lapply(DT, function(x) sum(is.na(x)))
$kkey
[1] 4

$data
[1] 5

But how do I create a simple data.table like this, the data.table way?:

      missing.kkey  missing.data
1:               4             5
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Konstantinos
  • 4,096
  • 3
  • 19
  • 28
  • 2
    Instead of using `:=`, just keep the result in a `list(..)`. What happens with `:=` is that it assigns two new columns in the original dataset, which is not what you wanted. – akrun Feb 11 '16 at 19:11
  • 2
    I find `colSums(is.na(DT))` handy for such things – talat Feb 11 '16 at 19:13
  • 2
    What @akrun said... just remove everything in `j` to the left of (and including) `:=` (and mix with `setNames` if you're married to how the output is named – MichaelChirico Feb 11 '16 at 19:13
  • Thank you guys! `missing.DT <- DT[, lapply(.SD, function(x) sum(is.na(x)))]` and then `setnames(missing.DT, paste0('missing.', names(DT)))` produces exactly what I wanted. I didn't know about `colSums` which is much more simpler to use, though! Thank you! – Konstantinos Feb 11 '16 at 19:22

0 Answers0