-1

with data like below, have data for hours of each day for each area,loc pair. Need to find out the rows for each area,loc for which value of a is maximum.

day,hour,area,loc,a,b,c
20181231,ar01,loc01,00,99,11.3,18.2
20181231,ar01,loc01,22,96,12.3,15.2
20190101,ar01,loc01,00,98,10.9,22.5
20190101,ar01,loc01,23,97,10.9,22.1
20181231,ar02,loc01,00,93,11.3,18.2
20181231,ar02,loc01,22,96,12.3,15.2
20190101,ar02,loc01,00,97,10.9,22.5
20190101,ar02,loc01,23,97.2,10.9,22.1

expected output

day,hour,area,loc,a,b,c
20181231,ar01,loc01,00,99,11.3,18.2
20190101,ar01,loc01,00,98,10.9,22.5
20181231,ar02,loc01,22,96,12.3,15.2
20190101,ar02,loc01,23,97.2,10.9,22.1

I could do an aggregation using dplyr, like df %>% group_by(day, area, loc) - how do I get the result rows from here ?

user3206440
  • 4,749
  • 15
  • 75
  • 132
  • 1
    `df %>% group_by(day, area, loc) %>% slice(which.max(a))` – Ronak Shah Nov 25 '19 at 06:41
  • @RonakShah, just a quick question, does `filter` is not providing the same output than `slice` ? Does there is any difference ? – dc37 Nov 25 '19 at 06:46
  • @dc37 because when you are comparing values with `==` in `filter` in case of tie it will return all the rows which are equal to maximum value whereas when we use `which.max` in `slice` it will return only the first maximum. This example may clarify. `x <- c(1:5, 5)` and compare output of `which.max(x == 5)` and `which(x == 5)` – Ronak Shah Nov 25 '19 at 06:49
  • @RonakShah, thanks for this clarification. Based on the question, I'm not sure which one between `filter` or `slice` will solve the question. – dc37 Nov 25 '19 at 06:52

1 Answers1

1

You can try:

library(dplyr)
df %>%
   group_by(day, area, loc) %>%
   filter(., a == max(a))
dc37
  • 15,840
  • 4
  • 15
  • 32