Summarizing outcomes by groups in R

Question

The following code works....

sum(  (WASDATj$HCNT == 1 | WASDATj$HCNT == -1 |  WASDATj$HCNT == 0  ) & WASDATj$Region=='United States' 
    & WASDATj$Unit=='Million Bushels' 
    & WASDATj$Commodity=='Soybeans'
    &  WASDATj$Attribute == 'Production' 
    &  WASDATj$Fdex.x == 10
    ,na.rm=TRUE  
    )

It counts the number of observations where HCNT takes a value of -1,1,0 it provides a single number for this category. The variable WASDATj$Fdex.x takes a value from 1-20.

How can I generalize this to count the number of observations that take a value -1,1,0 for each of the values of Fdex.x (so provide me 20 sums for Fdex.x from 1-20)? I did look for an answer, but I'm such a novice I may have missed what is an obvious answer....

Hi Its-ame Mari! Welcome to SO! Very likely we can help and improve what you have so far. To help further, please edit your question with example data from `WASDATj` using `dput(head(WASDATj))` and include what your final result should like (perhaps a data frame with 2 columns, 1st column Fdex.x value 1-20 and 2nd column with the sum). Is the `na.rm` for `NA` values anywhere in your data or just intended for one column? Finally, please see [how to make a great reproducible example in R](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — Ben, Oct 14 '20 at 01:06

score 0 · Answer 1 · answered Oct 14 '20 at 20:03

Simply extend your sum of a boolean vector to aggregate function using length which is essentially a count aggregation and analogous to your sum of TRUE:

agg_df <- aggregate(cbind(Count=HCNT) ~ Fdex.x, 
                    data=WASDATj[WASDATj$HCNT %in% c(1,-1, 0) &
                                 WASDATj$Region=='United States' & 
                                 WASDATj$Unit=='Million Bushels' &
                                 WASDATj$Commodity=='Soybeans' &
                                 WASDATj$Attribute=='Production', ],
                    FUN=length)

Result should be a data frame of 20 rows by two columns for each distinct Fdex.x value and corresponding count.

And if needed, you can extend grouping for other counts by adjusting formula and data filter:

agg_df <- aggregate(cbind(Count=HCNT) ~ Fdex.x + Region + Unit + Commodity + Attribute, 
                    data=WASDATj[WASDATj$HCNT %in% c(1,-1, 0), ],
                    FUN=length)

Summarizing outcomes by groups in R

1 Answers1