I need to calculate a several types of summary statistics across multiple columns of data using multiple grouping variables with a varying number of values per group. Example data set and result data frames below.
I know how to conduct these calculations using dplyr
, but am moving away from dplyr
as I have to regularly update code as the function names and function operations change with updates. So…I am packing my bags and moving to base package.
DATA = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(1, 2, 30, 40, 100, 200, NA, 5000, 6000),
STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000))
RESULT = data.frame(SITE = c("A","A","B","C"),
DATE = c("1","2","3","4"),
MEAN_STUFF = c(1.5, 35, 150, 5500),
MEAN_STUFF2 = c(3, 70, 400, 11000))
I tried using the below code, but the missing value causes the mean for STUFF2 to be calculated using only two values rather than 3 values.
mean = aggregate(cbind(STUFF, STUFF2) ~ SITE + DATE, mean, data = DATA )
I also tried running the code with the na.action statement, but the same issue occurred.
mean = aggregate(cbind(STUFF, STUFF2) ~ SITE + DATE, mean, data = DATA , na.action = na.omit)
Any recommendations using base package would be helpful.
Thanks in advance for your time.