0

I am wondering if there is a package or fast way to generate a statistical summary table for the result of clustering. I imagine I can choose variables of interest and group by cluster number and then calculate mean and max and etc. I am looking for a fast way to do it. Is there any package I can use?

Thanks

Ross_you
  • 881
  • 5
  • 22
  • Welcome to StackOverflow! In my opinion, this question is much too vague. It would help a lot if you could (a) specify *exactly* what statistic you want to summarize, (b) supply a [Minimal Reproducible Example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), and (c) tell us what you have tried so far. – Vincent Sep 08 '20 at 18:38

1 Answers1

0

The fastest and easiest way might depend on the exact results you want. The easiest approach is probably summary() in base R, the more versatile is to use the package dplyr with its functions group_by() and summarize(). For specific type of data, other packages may provide a more practical summary.

An example:

DF <- data.frame(groups = sample(LETTERS, 20, replace = TRUE),
                 var = runif(20))

summary(DF)

library(dplyr)
DF %>%
  group_by(groups) %>%
  summarize(mean_by_group = mean(var),
            number = n())
Alexlok
  • 2,999
  • 15
  • 20