18

I have below working code. When i replicate same things on a different data set i get errors :(

#max by values
df <- data.frame(age=c(5,NA,9), marks=c(1,2,7), story=c(2,9,NA))
df

df$colMax <- apply(df[,1:3], 1, function(x) max(x[x != 9],na.rm=TRUE))
df

I tried to do the same on a bigger data and I am getting warnings, why?

maindata$max_pc_age <- apply(maindata[,c(paste("Q2",1:18,sep="_"))], 1, function(x) max(x[x != 9],na.rm=TRUE))


50: In max(x[x != 9], na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf

in order to understand the problem better I made changes as below, but still getting warnings

maindata$max_pc_age <- apply(maindata[,c(paste("Q2",1:18,sep="_"))], 1, function(x) max(x,na.rm=TRUE))
1: In max(x, na.rm = TRUE) : no non-missing arguments to max; returning -Inf
user2543622
  • 5,760
  • 25
  • 91
  • 159
  • 3
    Can you figure out the problem from the result of `max(numeric(0))`, or do you need more explanation? – joran Jul 01 '14 at 21:52
  • still need help ...I tried class( numeric(0)) and it returned numeric..shouldnt max function work on it? – user2543622 Jul 01 '14 at 21:58
  • 4
    It _is_ working. If a vector has no elements in it, what is the maximum value? You're asking for the max of values that _are not 9 and are not NA_. Apparently sometimes that leaves nothing left. – joran Jul 01 '14 at 21:59
  • @joran: the maximum of nothing is obvious `-Inf`. – Joshua Ulrich Jul 01 '14 at 22:02
  • i tried max(maindata[1,c(paste("Q2",1:18,sep="_"))],na.rm=TRUE) and it returned a value – user2543622 Jul 01 '14 at 22:02
  • got it, how should i change my code if some of the rows contain either 9 or NA? – user2543622 Jul 01 '14 at 22:03
  • Depends on what you think is "correct". You aren't getting an error, just a warning. If you think `-Inf` doesn't make sense for those cases, change them to `NA` after the fact. Or omit those rows entirely. – joran Jul 01 '14 at 22:04
  • I would like max function to return "missing" when a row contain either 9 or NA – user2543622 Jul 01 '14 at 22:05
  • You can't mix character and numeric in a vector. `NA` means "missing" in R. Just change the `-Inf`'s to `NA`. Or just write a longer function with an `if` clause that checks that case first. – joran Jul 01 '14 at 22:09
  • I tried maindata$max_pc_age[maindata$max_pc_age==-Inf]<- "NA" and it solves my problem. do you see any problem with this line of a code? – user2543622 Jul 01 '14 at 22:11
  • 3
    Yes, a big one. Your whole vector will be characters not numbers. – joran Jul 01 '14 at 22:23
  • Got it...but i can work with characters...If you post your comments below I will mark that as accepted answer...thanks for educating me! – user2543622 Jul 03 '14 at 15:43

3 Answers3

18

It seems that the problem has been pointed out in the comments already. Since some vectors contain only NAs, -Inf is reported, which I take from the comments you don't like. In this answer I would like to point out one possible way to tackle the issue, namely to built in a control statement (instead of overwritting -Inf after the fact, which is equally valid). For instance,

 my.max <- function(x) ifelse( !all(is.na(x)), max(x, na.rm=T), NA)

does this trick. If every (all) element in x is NA, then NA is returned, and the max otherwise. If you want any other value returned, just exchange NA for that value. You can also built this easily into your apply-function. E.g.

 maindata$max_pc_age <- apply(maindata[,c(paste("Q2",1:18,sep="_"))], 1, my.max)

I am still sometimes confused by R's NA and empty set treatment. Statements like test <- NA; test==NA will give NA as a result (instead of TRUE, as returned by is.na(test)), which is sometimes rationalized by saying that since the value is missing, how could you know that these two missing values are identical? In this case, however, max returns -Inf since it is given an empty set, which I think is not at all obvious. My experience is though that if strange and unexpected results pop up, NAs or empty sets are often involved.

coffeinjunky
  • 11,254
  • 39
  • 57
1

In cases like below:

df[2,2] <- NA
df[1,2] <- -5

apply(df, 1, function(x) max(x[x != 9],na.rm=TRUE))
#[1]    5 -Inf    7
#Warning message:
#In max(x[x != 9], na.rm = TRUE) :
#  no non-missing arguments to max; returning -Inf

You could do:

df1 <- df  
minVal <- min(df1[!is.na(df1)])-1

df1[is.na(df1)|df1==9] <- minVal
val <- do.call(`pmax`, df1)
val[val==minVal] <- NA
val
#[1]  5 NA  7
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    +1 for pmax/pmin, although better methods could be developed when only one unlabeled argument is passed, precluding all this `do.call` business. You can overload it to make `na.rm=T` the default, or you can say, `do.call(pmax, c(df1, list(na.rm=T))`. – AdamO Dec 20 '17 at 21:50
1

You can use hablar::max_ which returns NA if all values are NA

apply(df, 1, function(x) hablar::max_(x[x!=9]))
#[1]  5 NA  7

data

df <- structure(list(age = c(5, NA, 9), marks = c(-5, NA, 7), story = c(2, 
9, NA)), row.names = c(NA, -3L), class = "data.frame")

df
#  age marks story
#1   5    -5     2
#2  NA    NA     9
#3   9     7    NA
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213