2

I would like to write code that functions similar to that of an AVERAGEIFS function in Excel. Essentially, I would like to take the average of a variable given certain conditions are met. I have not been successful in finding a similar problem online given my situation calls for a unique condition that excludes certain instances.

What I am trying to accomplish is calculating the mean of total revenue (TOTALREV) for each vessel ID (VESSEL_ID) in all survey years (SURVEY_YEAR)--Excluding the current survey year. I have tried creating an additional column called YEAR to create a conditional "not equal" as well as what my code currently depicts.

CPTWG <- CEEDCdatEXP[,c("VESSEL_ID","SURVEY_YEAR","TOTALREV")] %>%
  group_by(VESSEL_ID) %>%
  summarize(AVERAGEREV = mean(TOTALREV[SURVEY_YEAR != SURVEY_YEAR]))

This code should create my ideal outcome:

VESSEL_ID <- c(1,2,3,1,2,3,1,2,3)
SURVEY_YEAR <- c(2009,2009,2009,2010,2010,2010,2011,2011,2011)
TOTALREV <- c(2718,9939,4014,7019,2016,2025,3218,7727,7252)
AVERAGEREV <- c(5118.5,4871.5,4638.5,2968,8833,5633,4868.5,5977.5,3019.5)
mydata <- data.frame(VESSEL_ID,SURVEY_YEAR,TOTALREV,AVERAGEREV)
Gary Eaton
  • 21
  • 1
  • Extremely slick answer to this [here](https://stackoverflow.com/questions/35858876/calculate-group-mean-while-excluding-current-observation-using-dplyr). For your purposes...`mydata <- group_by(VESSEL_ID) %>% mutate(AVERAGEREV = (sum(TOTALREV) - TOTALREV) / (n() - 1))` – Nick Criswell Mar 27 '19 at 21:04
  • @NickCriswell--This is great! Thank you for digging this up as I was unable to do... In addition, I thought this would take me home, but to make things slightly more complex, I have one more column flagging each observation as an outlier or not. Any suggestions on how I might alter the mutate to incorporate only those data where outlier == 0 ? – Gary Eaton Mar 27 '19 at 22:13
  • I was going to say that you could add some kind of dummy column that would take on `NA` when your outliers happen. Then you could change your `sum` to work on that column with an `na.rm = TRUE` argument. A couple of other options (including one from the author of `dplyr`) are linked [here](https://stackoverflow.com/questions/23528862/summarize-with-conditions-in-dplyr). You would probably have to tweak the `n() - 1` part, too, since your would likely have a lower denominator in your average. – Nick Criswell Mar 27 '19 at 23:05
  • Thanks @NickCriswell! – Gary Eaton Mar 27 '19 at 23:22

0 Answers0