3

I have a R data.table

DT = data.table(x=rep(c("b","a",NA_character_),each=3), y=rep(c('A', NA_character_, 'C'), each=3), z=c(NA_character_), v=1:9) 
DT
#    x  y  z v
#1:  b  A NA 1
#2:  b  A NA 2
#3:  b  A NA 3
#4:  a NA NA 4
#5:  a NA NA 5
#6:  a NA NA 6
#7: NA  C NA 7
#8: NA  C NA 8
#9: NA  C NA 9

For each column if the value is not NA, I want to extract the max value from column v. I am using

sapply(DT, function(x) { ifelse(all(is.na(x)), NA_integer_, max(DT[['v']][!is.na(x)])) })
 #x  y  z  v 
 #6  9 NA  9

Is there a simpler way to achive this?

nicola
  • 24,005
  • 3
  • 35
  • 56
imsc
  • 7,492
  • 7
  • 47
  • 69

2 Answers2

3

here is a way, giving you -Inf (and a warning) if all values of the column are NA (you can later replace that by NA if you prefer):

DT[, lapply(.SD, function(x) max(v[!is.na(x)]))]
#    x y    z v
# 1: 6 9 -Inf 9

As suggested by @DavidArenburg, to ensure that everything goes well even when all values are NA (no warning and directly NA as result), you can do:

DT[, lapply(.SD, function(x) {
  temp <- v[!is.na(x)] 
  if(!length(temp)) NA else max(temp)
})]
#   x y  z v
#1: 6 9 NA 9
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Cath
  • 23,906
  • 5
  • 52
  • 86
  • 2
    thanks @DavidArenburg and thanks for the more robust way, I'll add that! – Cath Nov 12 '15 at 12:55
  • Thanks. But is't it same as what I have got now. – imsc Nov 12 '15 at 13:03
  • 1
    There is also the `na.rm` argument of `max` (which is faster than subsetting and taking the max). – nicola Nov 12 '15 at 13:03
  • `na.rm` removes `na` but doesn't returns `na` if all the columns are `na`. – imsc Nov 12 '15 at 13:05
  • 1
    @imsc this is very different from your answer. 1- You are using `sapply` externally of the `DT`s scope and thus it's not optimized. 2- Unnecessary use of `ifelse` which can be very dangerous sometimes. 3- You code ins't concise. While this solution is pretty much the idiomatic `data.table` way, although there could be probably some more code golfing that can be done. – David Arenburg Nov 12 '15 at 13:58
  • Thanks @DavidArenburg for the explanation. On a different note, why is `ifelse` dangerous? – imsc Nov 12 '15 at 22:49
  • 1
    `ifelse` has various side effects, here's one http://stackoverflow.com/questions/6668963/how-to-prevent-ifelse-from-turning-date-objects-into-numeric-objects it also not so efficient in general http://stackoverflow.com/questions/16275149/does-ifelse-really-calculate-both-of-its-vectors-every-time-is-it-slow – David Arenburg Nov 12 '15 at 22:51
1

We can use summarise_each from dplyr

library(dplyr)
DT %>%
   summarise_each(funs(max(v[!is.na(.)])))
#    x y    z v
#1: 6 9 -Inf 9
akrun
  • 874,273
  • 37
  • 540
  • 662
  • NOTE: I would be happy to use `%>%` and have the rest in 2 lines instead of cramming everything in a line and call it as a `one-liner` – akrun Nov 12 '15 at 16:04