Calculate various means from data based on values

Question

I have a large spreadsheet of team member ratings from which I want to calculate how people rated themselves, how they were rated by everyone else on their team, and how they rated everyone else on their team (all averages). I've been trying to do this with dplyr because I have used it before and I think that the group_by will simplify things when doing these calculations. I haven't been able to figure it out so I'm asking for help. I'll try to explain my thinking.

Here's an example dataset:

data <- read.table(text="
Team    Rater   A1  B1  C1  A2  B2  C2  A3  B3  C3  A4  B4  C4  A5  B5  C5  A6  B6  C6
1   1   2   4   4   2   1   5   2   2   3   4   4   4   3   2   1   NA  NA  NA
1   2   4   5   4   4   5   1   1   1   5   5   3   1   4   5   2   NA  NA  NA
1   3   2   1   4   3   5   5   2   1   5   1   1   4   1   1   4   NA  NA  NA
1   4   4   3   4   3   5   1   3   1   3   5   5   5   5   2   2   NA  NA  NA
1   5   3   4   5   4   3   3   5   5   4   1   4   5   5   5   1   NA  NA  NA
2   1   3   5   3   4   1   1   3   4   3   4   3   2   2   2   3   3   5   3
2   2   3   2   3   1   1   3   5   5   1   5   2   3   2   2   1   3   3   2
2   3   3   2   3   3   5   2   4   1   1   1   4   5   3   5   2   1   1   3
2   4   3   3   5   4   3   5   3   1   4   3   1   1   4   2   4   3   5   2
2   5   5   2   1   2   5   5   3   3   1   4   1   5   5   3   3   4   2   5
2   6   3   2   3   5   4   3   2   1   5   4   3   1   1   1   4   2   2   1",header = TRUE)

Each rater provides input on multiple questions for each other team member. The way it is organized, rater 1 answers A1, B1, and C1 about themselves. Rater 2 answers A2, B2, and C2 about themselves, and so on.

Self Ratings

To get someone's rating of themselves I figured it would be something like:

data %>%
  group_by(Team) %>%
  mutate(self = rowMeans(select(.,ends_with(Rater)), na.rm = TRUE))

It'd be convenient if the column selection was dynamically based on their rater number.

From Others

I was thinking of calculating this based on the average overall rating of that person except the self rating:

data %>%
  group_by(Team) %>%
  mutate(from = ( (mean(ends_with(Rater)) * n() - self ) / ( n() - 1 ) ) )

Of Others

For this column calculation I was thinking something along the lines of:

data %>%
  mutate(of = select(A1:C6, -(ends_with(Rater))) %>% rowMeans(na.rm = TRUE))

(similar to this answer)

Results

Here is an example of what I'm looking for as new columns:

Team    Rater   self    from    of
1   1   3.33    3.58    2.75
1   2   3.33    3.33    3.33
1   3   2.67    2.92    2.67
1   4   5.00    3.08    3.00
1   5   3.67    2.67    3.83

If you can help with any of these parts I'd appreciate it!

score 3 · Accepted Answer · answered Feb 22 '16 at 21:22

I would recommend first transforming your data into a "tidy" format with tidyr like such

library(tidyr)
tidy <- data %>% gather(QV,Rating,-Team,-Rater) %>%
    separate(QV, into=c("Quest","Rated"), sep=1) %>%
    mutate(Rated=as.numeric(Rated)) %>%
    filter(!is.na(Rating))

This transforms your data to have the following shape

  Team Rater Quest Rated Rating
1    1     1     A     1      2
2    1     2     A     1      4
3    1     3     A     1      2
4    1     4     A     1      4
5    1     5     A     1      3
6    2     1     A     1      3
...

So we turn your data into a long format. Then you can perform each of the queries a bit more directly and merge them together

Reduce(left_join, list(
  tidy %>% group_by(Team, Rater) %>% filter(Rated==Rater) %>% summarize(self=mean(Rating)),
  tidy %>% group_by(Team, Rated) %>% filter(Rated!=Rater) %>% summarize(others=mean(Rating)) %>% rename(Rater=Rated),
  tidy %>% group_by(Team, Rater) %>% filter(Rated!=Rater) %>% summarize(of=mean(Rating))
))

This returns

    Team Rater     self   others       of
   (int) (dbl)    (dbl)    (dbl)    (dbl)
1      1     1 3.333333 3.583333 2.750000
2      1     2 3.333333 3.333333 3.333333
3      1     3 2.666667 2.916667 2.666667
4      1     4 5.000000 3.083333 3.000000
5      1     5 3.666667 2.666667 3.833333
6      2     1 3.666667 2.866667 2.866667
7      2     2 1.666667 3.466667 2.800000
8      2     3 2.000000 2.933333 2.866667
9      2     4 1.666667 3.133333 3.400000
10     2     5 3.666667 2.533333 3.200000
11     2     6 1.666667 3.000000 2.800000

Calculate various means from data based on values

Self Ratings

From Others

Of Others

Results

1 Answers1