-1

Suppose there are 10 students divided into 3 groups. Each student will take a test and get a score. The data is stored in a data.frame variable named d as follows:

  Group_no        Grade
    1               98
    1               89
    1               99
    2               87
    2               94
    2               91
    3               86
    3               85
    3               93
    3               90

Question 1: Now I want to calculate the mean score for EACH GROUP, Can you give me a hint how to do so.

Question 2: How to count how many groups are in the list, suppose I did not know it in advance.

Thanks

Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
Hao Zhang
  • 257
  • 3
  • 4
  • 15

1 Answers1

0

You can use aggregate for the mean:

> aggregate(Grade ~ Group_no, data=x, FUN=mean)
  Group_no    Grade
1        1 95.33333
2        2 90.66667
3        3 88.50000

To get the count per-group, table is handy:

> table(x$Group_no)

1 2 3 
3 3 4 

And you can ask aggregate to give you both:

> aggregate(Grade ~ Group_no, data=x, FUN=function(x) c(mean=mean(x), count=length(x)))
  Group_no Grade.mean Grade.count
1        1   95.33333     3.00000
2        2   90.66667     3.00000
3        3   88.50000     4.00000
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
  • Thanks for your help. I still have two questions. First, "data=x", here x means the data.frame, right? Second, Now if I add a new column called "grade2", now I want to calculate the correlation of two grades for each group. I have tried the follows: aggregate(c(Grade, Grade2)~Group_no, data=x, FUN=cor). But it did not work. The error message is : variable lengths differ. Can you help me again. Thx – Hao Zhang Sep 19 '15 at 21:05
  • You probably want `cbind` instead of `c` there, but this would be better addressed by asking a new question. – Matthew Lundberg Sep 20 '15 at 04:13