Fastest way to remove same number of NA from each column and realign data

Question

Extend from this post that gives the result as follows:

     x  y  z
 1:  1 NA NA
 2:  2 NA 22
 3:  3 13 23
 4:  4 14 24
 5:  5 15 25
 6:  6 16 26
 7:  7 17 27
 8: NA 18 28
 9: NA 19 NA
10: NA NA NA

As you can see, if NAs of each column are removed, we can obtain data.table as follows:

I come up with this code to obtain the above result:

mat.temp <- na.omit(mat[,1, with = F])
for (i in 2:3) {
  temp <- na.omit(mat[,i, with = F])
  mat.temp <- cbind(mat.temp, temp)
}

However, I am not sure it is efficient. Could you please give me suggestions ?

Thank you

Similar but quite different task here: [Fastest way to drop rows with missing values](http://stackoverflow.com/questions/13755547/fastest-way-to-drop-rows-with-missing-values) — Matt Dowle, Jul 13 '14 at 11:51
@MattDowle Is it possible to update when using DT[, lapply(.SD, function(x) x[!is.na(x)])], instead of creating new variable ? Thank you — newbie, Jul 14 '14 at 06:11

score 3 · Accepted Answer · answered Jul 13 '14 at 07:35

3

It sounds like you are just trying to do:

DT[, lapply(.SD, function(x) x[!is.na(x)])]
#    x  y  z
# 1: 1 13 22
# 2: 2 14 23
# 3: 3 15 24
# 4: 4 16 25
# 5: 5 17 26
# 6: 6 18 27
# 7: 7 19 28

However, I'm not sure how well this would hold up if you have a different number of NA values in each column.

answered Jul 13 '14 at 07:35

I forgot to mention that each column will have the same number of NAs, so I think your suggestion works in my case. Thanks – newbie Jul 13 '14 at 08:48
Just my curiosity. Is it possible to update when using `DT[, lapply(.SD, function(x) x[!is.na(x)])]`, instead of creating new variable ? – newbie Jul 14 '14 at 03:39
@newbie, not that I can think of. Better to ping Matt or Arun and see what they think. – A5C1D2H2I1M1N2O1R2T1 Jul 14 '14 at 03:42

1 Answers1