-3

I have a dataframe that I am trying to group so I can do some basic stats on each group. However, since the column I am using to group is a character vector, I am not successful in my various attempts to get this done. Here is a sample

    Name Value rate
1  SW115    25    3
2  SW115    34    3
3  SW115    25    3
4  SW115    30    3
5  SW115    36    3
6  SW345    32    4
7  SW345    43    4
8  SW345    35    4
9  SW345    24    4
10 SW345    23    4
11 SW445    32    5
12 SW445    33    5
13 SW445    24    5
14 SW445    35    5
15 SW445    25    5

As I said, I would like to group it by "name", and find the mean and cv of of "value" for each group. So, in my example, SW115 would be a group and SW345 would be another group, and I would like to know the mean, sd, and coefficient of variation of each group. I can do this manually by subsetting, but the original data I am working wits has over 5000 rows with about 57 possible groups and it would take me hours to go through each group manually. I know there has to be a way to get it done with a few line of codes so that it pits out the summary for each group all that once.

I tried converting the column "name" to a numbers (so that each group has a number), but I just couldn't get it done as well.

Any suggestions would be greatly appreciated

Nolage86
  • 23
  • 1
  • 2
  • 6
  • The fact that `Name` is a character vector shouldn't inhibit your ability to calculate summary statistics by group. What have you tried so far? `tapply`? The `dplyr` or `data.table` packages? – Benjamin Oct 21 '15 at 20:19
  • 5
    Have you tried Google? There are sooooo many examples online, for instance **[this](http://stackoverflow.com/questions/21982987/mean-per-group-in-a-data-frame)**, and **[this](http://stats.stackexchange.com/questions/8225/how-to-summarize-data-by-group-in-r)** and **[this](http://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group)** and **[this](http://stackoverflow.com/questions/14035872/ddply-for-sum-by-group-in-r)** and **[this](http://stackoverflow.com/questions/13666780/r-data-table-calculate-sum-of-a-list-of-variables-by-group)** and only G-od knows how many more – David Arenburg Oct 21 '15 at 20:23
  • @DavidArenburg been searching all day. i guess I didn't type in the appropriate sentences that would have directed me to those examples. That is exactly what I am looking for. Merci beaucoup! – Nolage86 Oct 21 '15 at 20:29
  • @Benjamin No, but i will look into those packages as well. Thanks. – Nolage86 Oct 21 '15 at 20:31

2 Answers2

1

Using dplyr, this is pretty simple.

library(dplyr)

x <- [YOUR DATA SET]

x %>%
  group_by(Name) %>%
  dplyr::summarise(mean = mean(Value),
                   sd = sd(Value)) %>%
  mutate(cv = (sd/mean)*100)
maloneypatr
  • 3,562
  • 4
  • 23
  • 33
1

Creating your data:

name <- c(rep("SW115", 5), rep("SW345", 5), rep("SW445", 5))
Value <- c(25,34,25,30,36,32,43,35,24,23,32,33,24,35,25)
rate <- c(rep(3, 5), rep(4, 5), rep(5, 5))
df <- data.frame(name, Value, rate)

This is what you want:

aggregate(df[,2:3], list(df$name), mean )
aggregate(df[,2:3], list(df$name), sd )

And the statistics you want.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Developer
  • 917
  • 2
  • 9
  • 25