0

Consider the following data

F1 <- c(1,1,1,0,1)
F2 <- c(10,20,15,7,20)
F3 <- c('A', 'D', 'B', 'A', 'A')
F4 <- c(9,6,20,20,20)
F5 <- c(2,1,21,8,7)
df1 <- data.frame(F1,F2,F3,F4,F5)

When df1$F1==1 I want to obtain the max between $F4, $F5 and $F2, but only consider $F2 if the $F3 factor is A or B. Otherwise write NA

df1$max <- with(df1, ifelse(F1==1, pmax(F2[F3_condition],F4,F5), NA))

How can one account for the F3_condition where er consider on factor A or B?

So $max will take the following values: c(10,6,21,NA,20)

I have reviewed a similar question, but it does not exactly deal with the specific condition I require.

Community
  • 1
  • 1
user08041991
  • 617
  • 8
  • 20

2 Answers2

2

You can add another ifelse to modify F2 vector before using pmax (replace corresponding values with -Inf and it will be less than most of the values):

df1$max <- with(df1, ifelse(F1==1, pmax(ifelse(F3 %in% c("A", "B"), F2, -Inf), F4, F5), NA))
df1$max
# [1] 10  6 21 NA 20

Alternatively, replace it with NA and use na.rm = T in pmax depending on if you have NAs in F4 and F5:

df1$max <- with(df1, ifelse(F1==1, pmax(ifelse(F3 %in% c("A", "B"), F2, NA), F4, F5, na.rm = T), NA))
df1$max
# [1] 10  6 21 NA 20
Psidom
  • 209,562
  • 33
  • 339
  • 356
1

If all values of F4 and F5 are non-negative, as in the example, and F1 is only composed of 0s and 1s, then the following will also work:

with(df1, pmax(F2 * (F3 %in% c("A", "B")) , F4, F5) * NA^(!F1))
[1] 10  6 21 NA 20

Here, F2 * (F3 %in% c("A", "B")) returns 0 for F2 values where F3 is not A or B. pmax calculates the maximum for the variables in each row. Then, the resulting vector is multiplied by NA^(!F1) which returns 1 when F1 != 0 and NA when F1 == 0.

lmo
  • 37,904
  • 9
  • 56
  • 69