1

I have to translate someones SAS code into R as my company is moving away from SAS and was wondering if there were a comparable R function to proc means? For reference, proc means produces (or at least what I need it to produce) is the number of observations, the mean, min, max, and standard deviation. It is also important that this be able to be done by groups as well as weighted. The summary function in R produces this output but I don't think you can do it weighted. I would prefer an answer that uses base R but if that doesn't exist, a package would be okay.

Example:

df
   Temp V1  Weight
1  Hi    1  8 
2  Low   2  3 
3  Hi    3  9 
4  Low   4  9 

I need to produce various weighted summary statistics (median,min,max,sum,mean etc) of V1 by group as follows:

Group Min Max Mean Sum
Hi      1  3  2.06 35
Low     2  4  3.5  42
 
astel
  • 192
  • 7
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pretend the SAS function doesn't exist and just explain exactly what you want to do in R. – MrFlick Oct 20 '20 at 18:01
  • 1
    As an R and SAS programmer, there isn't a great one out there IMO especially if you need summaries at different levels. PROC FREQ/MEANS are the biggest things I miss in SAS.. I usually just use tidyverse approach to build my own statistics, so more similar to using PROC SQL than PROC MEANS. – Reeza Oct 20 '20 at 19:03
  • @astel, the RStudio community is more helpful in general - not as active as the SAS community forum but you'll get answers in a reasonable time frame. https://community.rstudio.com/ – Reeza Oct 20 '20 at 20:02
  • There are many, many functions that have been written to summarize data. You can probably find something that does exactly what you want if you are will to explore. Some examples (package::function): `base::summary`, `RcmdrMisc::numSummary`, `DecTools::Desc`, `Hmisc::describe`, 'pastecs::stat.desc', and `psych::describe`. This is not exhaustive by any means, just some of the ones I've found useful. – dcarlson Oct 20 '20 at 23:06

1 Answers1

1

Generally, aggregate in base R can serve as counterpart to SAS's proc means. Typically, this method runs a single simple function (that takes one input) on a single numeric column. Below is the formula version of the method:

aggregate(V1 ~ Temp, df, FUN=mean)

But aggregate can be extended for multiple columns and multiple functions:

agg_raw <- aggregate(cbind(V1, Weight) ~ Temp, df, 
                     FUN = function(x) c(count = length(x),
                              min = min(x), 
                              p = quantile(x, 2, probs = seq(0, 1, 0.25))[2],
                              median = median(x),
                              mean = mean(x),
                              p = quantile(x, 2, probs = seq(0, 1, 0.25))[4],
                              sum = sum(x))
                    )

agg_df <- do.call(data.frame, agg_raw)
agg_df
#   Temp V1.count V1.min V1.p.25. V1.median V1.mean V1.p.75. V1.sum Weight.count Weight.min Weight.p.25. Weight.median Weight.mean Weight.p.75. Weight.sum
# 1   Hi        2      1      1.5         2       2      2.5      4            2          8         8.25           8.5         8.5         8.75         17
# 2  Low        2      2      2.5         3       3      3.5      6            2          3         4.50           6.0         6.0         7.50         12

For weighted statistics, aggregate does not have a built-in facility, so a bit more nuance is needed where you compute the statistics before aggregation. R's ave allows calculation across groups:

df$V1_wtmean <- ave(df[c("V1", "Weight")], df$Temp, FUN=function(x) weighted.mean(x$V1, x$Weight))[[1]]
df$V1_wtsum <- ave(df[c("V1", "Weight")], df$Temp, FUN=function(x) sum(x$V1 * x$Weight))[[1]]

df
#   Temp V1 Weight V1_wtmean V1_wtsum
# 1   Hi  1      8  2.058824       35
# 2  Low  2      3  3.500000       42
# 3   Hi  3      9  2.058824       35
# 4  Low  4      9  3.500000       42

Once you have such columns, you can aggregate on min and max to return grouping needs:

agg_df <- do.call(data.frame,
                  aggregate(cbind(V1, V1_wtmean, V1_wtsum) ~ Temp, df, 
                            FUN=function(x) c(min=min(x), 
                                              p = quantile(x, 2, probs = seq(0, 1, 0.25))[2],
                                              mean=mean(x),
                                              p = quantile(x, 2, probs = seq(0, 1, 0.25))[4],
                                              max=max(x), 
                                              sum=sum(x)))
                  )
                          
agg_df <- setNames(agg_df[c("V1.min", "V1.max", "V1.mean", "V1.sum", "V1_wtmean.min", "V1_wtsum.min")],
                   c("Min", "Max","Simple.Mean", "Simple.Sum", "Wgt.Mean", "Wgt.Sum"))
                                
agg_df
            
#   Min Max Simple.Mean Simple.Sum Wgt.Mean Wgt.Sum
# 1   1   3           2          4 2.058824      35
# 2   2   4           3          6 3.500000      42     

Online Demo

See Also:

Parfait
  • 104,375
  • 17
  • 94
  • 125