Sum of column values based on other columns in R

Question

I am relatively new to R,trying to get sum of column based on other columns .my data frame is like below

YEAR EventID LOSS
    1     554  334740
    1     415  149816
    1     207  199446
    2     961  527042
    3     614  188199
    4     968   87044
    4     650   75377
    4     341  424678
    4     397  210730
    5     610  368068

Here my condition is , I need the sum of HIGHEST loss by each year.

loss numbers of : EVENTID 554 from YEAR 1,EVENTID 961 from YEAR 2, EVENTID 341 from 4, EVENTID 610 from YEAR 5.

This kind of thing is really easy with dplyr package. There are lots of tutorials available. Here is one, for example: [http://www.sharpsightlabs.com/dplyr-intro-data-manipulation-with-r/](http://www.sharpsightlabs.com/dplyr-intro-data-manipulation-with-r/) — Harold Ship, Jun 14 '16 at 10:30
You have to edit your post: EventID 968 is not the highest one in year 4. — J_F, Jun 14 '16 at 10:32

score 3 · Accepted Answer · answered Jun 14 '16 at 10:53

This is an easy one with dplyr:

d %>%
   group_by(YEAR) %>%
   summarize(EventID = EventID[1],
             max_loss = max(LOSS))

You can also have multiple occurrences of events with the same max loss, then you would need to merge back to original dataset and filter:

d %>%
   group_by(YEAR) %>%
   summarize(max_loss = max(LOSS)) %>%
   left_join(d, .) %>%
   filter(LOSS == max_loss)

score 1 · Answer 2 · answered Jun 14 '16 at 11:28

My preference goes to sqldf for this sort of tasks:

library(sqldf)
sqldf("select * from mydata 
       group by YEAR
       having LOSS=max(LOSS);")

Results

##  YEAR EventID   LOSS
##     1     554 334740
##     2     961 527042
##     3     614 188199
##     4     341 424678
##     5     610 368068

score 0 · Answer 3 · answered Jun 14 '16 at 10:47

0

sum(unique(ave(df$LOSS, df$YEAR, FUN = max)))

Output:

[1] 1842727

answered Jun 14 '16 at 10:47

mpalanco

12,960
2
59
67

Sum of column values based on other columns in R

3 Answers3