This is a part of my dataset, which has similar data for 12 years and I want to calculate the cumulative sum of when the outflow changes from 0 to a value and store it as one event. I then want to find the maximum, minimum, mean and standard deviation of the event and group it by each year.
For example, in the dataset below, there are the two events. What would be the sum of each of those events and what would be the max, min, mean and SD of the two events?
Date Outflow
2/2/2014 0
2/2/2014 0
2/2/2014 0.073
2/2/2014 0.096
2/2/2014 0.096
2/2/2014 0.096
2/2/2014 0.015
2/2/2014 0.007
2/2/2014 0.003
2/2/2014 0.002
2/2/2014 0.001
2/2/2014 0.001
2/3/2014 0
2/3/2014 0
2/3/2014 0
2/3/2014 0
2/3/2014 0.022
2/3/2014 0.031
2/3/2014 0.034
2/3/2014 0.036
2/3/2014 0.037
2/3/2014 0.038
2/3/2014 0.038
2/3/2014 0.038
2/3/2014 0.038
2/3/2014 0.104
I have been able to count the number of events for each year but I haven't been able to calculate the sum of each of those events.
I have included code I used to calculate the number of event for each year. LID_yearly is when the dataset is split by year and the code is run.
m=rep(0,12) ## intialize a counter
x="no_value"
for(k in 1:length(LID_yearly)){
for(i in 1:length(LID_yearly[[k]]$`Surface Runoff`)){
if(LID_yearly[[k]]$`Surface Runoff`[i]==0){
m[k]=m[k]
x="no_value"
}
else if((LID_yearly[[k]]$`Surface Runoff`[i]!=0)&&(x=="no_value")){
m[k]=m[k]+1
x="value"
} else if((LID_yearly[[k]]$`Surface Runoff`[i]!=0)&&(x=="value")){
m[k]=m[k]
x="value"
}
}
}
I want to get a column for each statistic I am looking for (eg. max, min etc.) for the events is each year.
This is my current output that I have.
"Year Sum Max Min Mean SD No.of inflow No. of Outflow
"2007 0 0 NA NA NA 122 0"
"2008 0 0 NA NA NA 106 0"
"2009 0 0 NA NA NA 149 0"
"2010 0 0 NA NA NA 122 0"
"2011 0 0 NA NA NA 125 0"
"2012 0 0 NA NA NA 82 0"
"2013 10 0.226 0.001 0.092 0.077 125 5"
"2014 46 0.325 0.001 0.078 0.073 149 14"
"2015 56 0.26 0.001 0.087 0.069 153 15"
"2016 21 0.135 0.001 0.036 0.029 136 15"
"2017 11 0.22 0.002 0.086 0.068 148 4"
"2018 0 0 NA NA NA 138 0"
Here the last two columns are counting the number of events but the statistics are not what I am looking for as it calculates the mean or max of every single value rather than as an event.
Here is the code I used using dplyr:
LIDTDFNEW <- LIDnew %>% group_by(year) %>% summarise("Sum of Bypass
Volume"=sum(`Surface Runoff`),
"Maximum Bypass Volume"=max(`Surface Runoff`),
"Minimum Byass Volume"=(if (!all(`Surface Runoff`==0)) min(`Surface
Runoff`[`Surface Runoff`!=0])else NA),
"Average Bypass Volume"= (if(!all(`Surface Runoff`==0)) mean(`Surface
Runoff`[`Surface Runoff`!=0])else NA),
"SD of Byass Volume"=(if (!all(`Surface Runoff`==0)) sd(`Surface
Runoff`[`Surface Runoff`!=0])else NA))