How can I get the mean of the column by the variable name?

Question

I have a a dataframe like:

Col_name  Col_X    Col_Y    Col_Z
BoB         2        3          3    
BoB         3        4          3
Carl        4        5          2
Carl        2        3          3
Eva         5        2          5
Bob         1        1          2

I want to get the mean of each column by the name. So I want to get this df:

Col_name   Col_X    Col_Y      Col_Z
 BOB       2        2.33       2,33
 Carl      3        4          2,5
 Eva       5        2          5

Does anyone knows how to to this?

score 1 · Answer 1 · answered Jun 06 '16 at 09:45

Here is one approach with dplyr (BTW - Since you have different cases for names, I am not sure how you got your output, but I convert them to all lower case to get the desired output):

library(dplyr)
df %>%
  mutate(Col_name = tolower(Col_name)) %>%
  group_by(Col_name) %>%
  summarise_each(funs(mean))

Output as follows:

Source: local data frame [3 x 4]

  Col_name Col_X    Col_Y    Col_Z
     <chr> <dbl>    <dbl>    <dbl>
1      bob     2 2.666667 2.666667
2     carl     3 4.000000 2.500000
3      eva     5 2.000000 5.000000

Martin Schmelzer · Answer 2 · 2016-06-06T09:52:47.610

0

Use package dplyr and do the following:

library(dplyr)
df <- data.frame(Col_name = c("Bob", "Bob", "Carl", "Carl"), 
                 Col_X = c(2,3,4,2), Col_Y = c(3,4,5,3))
df %>% group_by(Col_name) %>% summarise_each(funs(mean(.)))

You are taking the data (df) group them by the column Col_name and then apply the function mean on each column and for all distinct groups.

Output:

Source: local data frame [2 x 3]

  Col_name Col_X Col_Y
    (fctr) (dbl) (dbl)
1      Bob   2.5   3.5
2     Carl   3.0   4.0

edited Jun 06 '16 at 09:52

answered Jun 06 '16 at 09:43

Martin Schmelzer

23,283
6
73
98

Oooops....looks like we both just posted a very similar answer right at the same time. – Gopala Jun 06 '16 at 09:46
I think you are missing the point of upper/lower case names in `Col_name` (look at the Bobs in the example) – talat Jun 06 '16 at 09:47
Well I see a difference of 2 minutes there :P – Martin Schmelzer Jun 06 '16 at 09:47
Did I miss the point or not? Question is not explicit about that... – Martin Schmelzer Jun 06 '16 at 09:51

score 0 · Answer 3 · answered Jun 06 '16 at 09:48

0

Using dplyr:

require(dplyr)

a <- data.frame(Col_name = c(rep("Bob", 2), rep("Carl", 2), "Eva", "Bob"),
                Col_X = runif(6),
                Col_Y = runif(6),
                Col_Z = runif(6))

 a %>% 
  group_by(Col_name) %>% 
  summarise_each(funs(mean))

answered Jun 06 '16 at 09:48

Manuel R

3,976
4
28
41

all had the same idea, so it seems :-) – Manuel R Jun 06 '16 at 09:49

score 0 · Answer 4 · answered Jun 06 '16 at 09:49

With package data.table you can do

# creating example data
library(data.table)
dt <- data.table(Col_name = c("Bob", "Bob", "Carl", "Carl"), Col_X = c(2,3,4,2), Col_Y = c(3,4,5,3))

# aggregate
dt[, lapply(.SD, mean), by = Col_name]

which returns:

   Col_name Col_X Col_Y
1:      Bob   2.5   3.5
2:     Carl   3.0   4.0

How can I get the mean of the column by the variable name?

4 Answers4