1

I have a data file as following example but much more larger

names    num    Y1  Y2
William  1  4.71    7.4
William  2  3.75    8
William  3  4.71    7.9
Katja    1  5.83    8.5
Katja    2  5.17    7.1
Katja    3  6.08    7.4
Aroma    1  4.04    7.5
Aroma    2  5       6.9
Aroma    3  4.3     7.9
...

I have to calculate the mean for each 3 of the same names (first column) for Y1 and Y2. And then make a bar chart by the average of each name with Y1 and Y2, separately. So on the x axis I will have the names and on the y axis the mean. Could anybody help me with this?

user2772716
  • 25
  • 1
  • 2
  • 5
  • 2
    Welcome on SO! Please read [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). What have you tried? – sgibb Sep 12 '13 at 13:12
  • I am very new to r and tried to use mean but could not get anything out of it. How to use aggregate? – user2772716 Sep 12 '13 at 13:20

2 Answers2

9

You can also use aggregate. See ?aggregate for further details.

> aggregate(.~names, FUN=mean, data=df[, -2])
    names       Y1       Y2
1   Aroma 4.446667 7.433333
2   Katja 5.693333 7.666667
3 William 4.390000 7.766667

Take a look at this post for another alternatives of taking mean for each group.

For the bar plots use R base barplot function although there other alternatives such as ggplot2 graphics.

barplot(DF[,2], names.arg=DF$names, ylab="mean of Y1", las=1) # for Y1
barplot(DF[,3], names.arg=DF$names, ylab="mean of Y2", las=1) # for Y2

which produce:

enter image description here

As you are very new to R, I recommend to read An introduction to R which is a good starting point you to learn the basics of R.

Community
  • 1
  • 1
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
1

Using the sqldf package (assuming df is your table)

library(sqldf)
sqldf("SELECT names, avg(Y1) as mean_Y1, avg(Y2) as mean_Y2 FROM df GROUP BY names")
Scott Ritchie
  • 10,293
  • 3
  • 28
  • 64
  • It allows you to execute `SQL` queries on data frames. So broken down, the query is grouping your table by the names column, then extracting the name, the mean of Y1, and the mean of Y2. – Scott Ritchie Sep 13 '13 at 00:05