I'm resorting to R to handling averageif()
and maxif()
like functions because my data set is too large and Excel keeps crashing.
I'm looking for a way to find the average
wind
based on status
, and between columns Start Date
and End Date
from df
. I'd imagine this would involve the between
function in R.
status<-c("tropical depression", "tropical depression", "tropical storm")
Begin_Date <- as.Date(c("1974/06/01","1980/06/05","1990/06/07"))
End_Date <- as.Date(c("1975/06/01","1981/07/05","1991/08/07"))
df<-as.data.frame(cbind(status,Begin_Date,End_Date))
df$Begin_Date<-as.Date(Begin_Date)
df$End_Date<-as.Date(End_Date)
df$status<-as.character(status)
storms$date <- as.Date(with(storms, paste(year, month, day,sep="-")), "%Y-%m-%d")
(Note: my actual problem is billing data, so I would need a solution to gather an average based on IDs, between a start billing date and end billing date).
Take the storms
dataset from dplyr
as an example.
starting with my df
below:
status Begin_Date End_Date
tropical depression 1974-06-01 1975-06-01
tropical depression 1980-06-05 1981-07-05
tropical storm 1990-06-07 1991-08-07
What I'd like:
status Begin_Date End_Date Avg Wind
tropical depression 1974-06-01 1975-06-01 44.3
tropical depression 1980-06-05 1981-07-05 66.7
tropical storm 1990-06-07 1991-08-07 56
I'm trying to get this to be dplyr
compliant as well.
My attempt is erroring out:
df$Average_wind < - storms %>% filter (status = df$status, date >= df$Begin_Date,date<df$End_Date) %>% summarise(avg = mean(wind))
Error in FUN(left) : invalid argument to unary operator
The average wind and max wind values in the "I'd like" example are not accurate, simply for formatting purposes.