0

inspired by this question, I was wondering how can you flag the row with the max number by group, while keeping all the other rows as well, instead of removing them?

Lets say i want to look for the max in Value for every ID separately. Then flag the max row with an 1 and all the others with 0.

so, basically getting from this:

ID    <- c(1,1,1,2,2,2,2,3,3)
Value <- c(2,3,5,2,5,8,17,3,5)
Event <- c(1,1,2,1,2,1,2,2,2)
DF <- data.table(cbind(ID, Value, Event))
DF
   ID Value Event
1:  1     2     1
2:  1     3     1
3:  1     5     2
4:  2     2     1
5:  2     5     2
6:  2     8     1
7:  2    17     2
8:  3     3     2
9:  3     5     2

to this:

DF
   ID Value Event flagMAX
1:  1     2     1       0
2:  1     3     1       0
3:  1     5     2       1
4:  2     2     1       0
5:  2     5     2       0
6:  2     8     1       1
7:  2    17     2       1
8:  3     3     2       0
9:  3     5     2       0

how can this be done? ideally with data.table

i tried :

DF[,flagMAX := ifelse(max(Value), 1, 0), by = "ID"]

but get all 1

any ideas?

s_baldur
  • 29,441
  • 4
  • 36
  • 69
Nneka
  • 1,764
  • 2
  • 15
  • 39

2 Answers2

0

In case you accept an answer with tidyverse

library(tidyverse)

ID    <- c(1,1,1,2,2,2,2,3,3)
Value <- c(2,3,5,2,5,8,17,3,5)
Event <- c(1,1,2,1,2,1,2,2,2)
DF <- data.frame(cbind(ID, Value, Event))

DF %>% 
  group_by(ID) %>% 
  mutate(flag = ifelse(Value == max(Value), 1,0))
#> # A tibble: 9 x 4
#> # Groups:   ID [3]
#>      ID Value Event  flag
#>   <dbl> <dbl> <dbl> <dbl>
#> 1     1     2     1     0
#> 2     1     3     1     0
#> 3     1     5     2     1
#> 4     2     2     1     0
#> 5     2     5     2     0
#> 6     2     8     1     0
#> 7     2    17     2     1
#> 8     3     3     2     0
#> 9     3     5     2     1
MarBlo
  • 4,195
  • 1
  • 13
  • 27
0

With data.table, we can group by 'ID', coerce the logical vector to binary with (as.integer or +) and assign (:=) the output to create new column 'flagMAX'

library(data.table)
DF[, flagMAX := +(Value == max(Value)), ID]

If we are grouping by both 'ID' and 'Event'

DF[, flagMAX := +(Value == max(Value)), by = .(ID, Event)]

In the OP's code, the ifelse 'test' argument is not a logical vector, but a single numeric value from max(Value), this gets coerced to logical resulting in any value greater than 0 to TRUE and others FALSE. Thus, resulting in 1 throughout

Illustration below

DF[, max(Value), ID]
#   ID V1
#1:  1  5
#2:  2 17
#3:  3  5

Now, we coerce it to logical

DF[, as.logical(max(Value)), ID]
#   ID   V1
#1:  1 TRUE
#2:  2 TRUE
#3:  3 TRUE

When we do the assignment with ifelse(TRUE, 1, 0), it returns all 1s.. If we have a 0 value as max, it returns FALSE

as.logical(c(5, 17, 5, 0))
#[1]  TRUE  TRUE  TRUE FALSE
akrun
  • 874,273
  • 37
  • 540
  • 662