4

I am relatively new to R,trying to get sum of column based on other columns .my data frame is like below

YEAR EventID LOSS
    1     554  334740
    1     415  149816
    1     207  199446
    2     961  527042
    3     614  188199
    4     968   87044
    4     650   75377
    4     341  424678
    4     397  210730
    5     610  368068

Here my condition is , I need the sum of HIGHEST loss by each year.

loss numbers of : EVENTID 554 from YEAR 1,EVENTID 961 from YEAR 2, EVENTID 341 from 4, EVENTID 610 from YEAR 5.

Rags
  • 133
  • 1
  • 1
  • 7
  • This kind of thing is really easy with dplyr package. There are lots of tutorials available. Here is one, for example: [http://www.sharpsightlabs.com/dplyr-intro-data-manipulation-with-r/](http://www.sharpsightlabs.com/dplyr-intro-data-manipulation-with-r/) – Harold Ship Jun 14 '16 at 10:30
  • You have to edit your post: EventID 968 is not the highest one in year 4. – J_F Jun 14 '16 at 10:32
  • `sum(tapply(df$LOSS, df$YEAR, max))` – Ernest A Jun 14 '16 at 10:46

3 Answers3

3

This is an easy one with dplyr:

d %>%
   group_by(YEAR) %>%
   summarize(EventID = EventID[1],
             max_loss = max(LOSS))

You can also have multiple occurrences of events with the same max loss, then you would need to merge back to original dataset and filter:

d %>%
   group_by(YEAR) %>%
   summarize(max_loss = max(LOSS)) %>%
   left_join(d, .) %>%
   filter(LOSS == max_loss)
toni057
  • 582
  • 1
  • 4
  • 10
1

My preference goes to sqldf for this sort of tasks:

library(sqldf)
sqldf("select * from mydata 
       group by YEAR
       having LOSS=max(LOSS);")

Results

##  YEAR EventID   LOSS
##     1     554 334740
##     2     961 527042
##     3     614 188199
##     4     341 424678
##     5     610 368068
Dominic Comtois
  • 10,230
  • 1
  • 39
  • 61
0
sum(unique(ave(df$LOSS, df$YEAR, FUN = max)))

Output:

[1] 1842727
mpalanco
  • 12,960
  • 2
  • 59
  • 67