0

I need to calculate a several types of summary statistics across multiple columns of data using multiple grouping variables with a varying number of values per group. Example data set and result data frames below.

I know how to conduct these calculations using dplyr, but am moving away from dplyr as I have to regularly update code as the function names and function operations change with updates. So…I am packing my bags and moving to base package.

DATA = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"), 
                  DATE = c("1","1","2","2","3","3","3","4","4"), 
                  STUFF = c(1, 2, 30, 40, 100, 200, NA, 5000, 6000),
                  STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000))

RESULT = data.frame(SITE = c("A","A","B","C"), 
                    DATE = c("1","2","3","4"), 
                    MEAN_STUFF = c(1.5, 35, 150, 5500),
                    MEAN_STUFF2 = c(3, 70, 400, 11000))

I tried using the below code, but the missing value causes the mean for STUFF2 to be calculated using only two values rather than 3 values.

mean = aggregate(cbind(STUFF, STUFF2) ~ SITE + DATE, mean, data = DATA )

I also tried running the code with the na.action statement, but the same issue occurred.

mean = aggregate(cbind(STUFF, STUFF2) ~ SITE + DATE, mean, data = DATA , na.action = na.omit)

Any recommendations using base package would be helpful.

Thanks in advance for your time.

Vesuccio
  • 607
  • 2
  • 6
  • 17

0 Answers0