1

I am trying to create an aggregate table with mean and count from the following data

> sample_data
   sample percent
1       A       5
2       A       2
3       A       3
4       B       7
5       B       7
6       C       4
7       C       3
8       C       2
9       C       3
10      D       5

I use this function

aggregate_sample =aggregate(sample_data[,2], list(sample_data$sample), FUN=function(x) c(mn=mean(x),ln=length(x)))

And from console output, I did get what I want:

> aggregate_sample
  Group.1     x.mn     x.ln
1       A 3.333333 3.000000
2       B 7.000000 2.000000
3       C 3.000000 4.000000
4       D 5.000000 1.000000

However, when I click on the Data for aggregate_sample, I only get this

  Group.1     x
1       A 3.333333
2       B 7.000000
3       C 3.000000
4       D 5.000000

Can anyone help me on how to get the right table results?

Norman Kuo
  • 143
  • 7

1 Answers1

1

The issue is that the column 'x' is a matrix with 2 columns as we used c(mn=mean(x),ln=length(x) in the FUN. We can change it to a regular data.frame with

aggregate_sample1 <- do.call(data.frame, aggregate_sample)

For these operations, another way is dplyr, where this can be done in a straightforward way

library(dplyr)
sample_data %>% 
     group_by(sample) %>%
     summarise(mn = mean(percent), ln = n())

Or using data.table

library(data.table)
setDT(sample_data)[, .(mn = mean(percent), ln = .N), by = sample]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • wow thanks a bunch, for the future is there anyway to do this instead of using do.call? – Norman Kuo Nov 11 '19 at 19:14
  • @NormanKuo I would use either `dplyr` or `data.table` instead of `aggregate` as 1) aggregate is slow, 2) with `NA` elements, have to readjust other parameters, 3), outcome is not always the expected as you have experienced – akrun Nov 11 '19 at 19:16
  • 1
    Thank you very much – Norman Kuo Nov 11 '19 at 19:17