Get info per item with students grouped by schools in R

Question

My main goal: I have a dataframe of lots of students, their responses for 45 items in a test, and the correct answer also for each of the 45 items. Some of the students are from the same school(We have an ID for each school).

What I need is to get the percentage of people who answered the item correct, within each school, and for each item.

I was able to separate the vectors, and correct the test for each student, then I have a data frame of 0's and 1's like in the picture(each line is a student).

Then I was able to get what I want for the question 1 with:

 escolas <- group_by(acertos, School_ID)
 percentual <- summarize(escolas, count = n(), P1 = (sum(Q1)/count)*100)

I could type 45 of those lines, changing the question reference, but I am pretty sure there's another way to do that but I could not figure this out.

Reproducible example, 20 students, 4 schools, and 5 items:

Student_ID = c(1:20)
School_ID = c(rep(1,5),rep(2,5), rep(3,5), rep(4,5))

Q1 = 1*(runif(20) < 0.5)
Q2 = 1*(runif(20) < 0.5)
Q3 = 1*(runif(20) < 0.5)
Q4 = 1*(runif(20) < 0.5)
Q5 = 1*(runif(20) < 0.5)

data <- tibble(Student_ID, School_ID, Q1, Q2, Q3, Q4, Q5)
data
    Student_ID School_ID    Q1    Q2    Q3    Q4    Q5


           1         1     0     1     1     0     1
           2         1     0     0     1     1     0
           3         1     0     1     0     0     0
           4         1     0     0     0     0     1
           5         1     0     1     1     1     1
           6         2     0     0     1     0     1
           7         2     0     0     1     1     1
           8         2     1     1     1     0     0
           9         2     0     0     1     0     0
          10         2     1     1     1     1     1

What I wish is something like this

        School_ID    Q1     Q2   Q3    Q4     Q5


           1         70%   50%   30%   20%   40%   
           2         60%   40%   20%   10%   30%

Meaning:

Considering all students from school 1(and only them),70% got Q1 right.

Considering all students from school 2(and only them), 30% got Q5 right, and so on. For all schools and all items.

I hope this can make it easier for your to try and have a better understanding of the challenge.

Please don't post pictures of data or external links to data. Instead, post data in a copy/pasteable text format. [See this FAQ for more advice on this topic](https://stackoverflow.com/q/5963269/903061). A small, copy/pasteable, reproducible example is perfect--about 10 rows, just a few columns, whatever is needed to illustrate the problem. — Gregor Thomas, Feb 21 '19 at 14:33
You might also consider transposing your data and grouping by question and then summarizing. Please post data so others can help. — Mike, Feb 21 '19 at 14:36

score 0 · Answer 1 · answered Feb 21 '19 at 14:40

So I attempted to replicate you data-frame as follows:

N = 200
data <- tibble(
  School_ID = 1:N,
  Q1 = 1*(runif(N) < 0.5),
  Q2 = 1*(runif(N) < 0.5),
  Q3 = 1*(runif(N) < 0.5),
  Q4 = 1*(runif(N) < 0.5),
  Q5 = 1*(runif(N) < 0.5),
)

Next, you can use tidyr or really just import the tidyverse as do this:

# this
library(tidyverse)
# or  this
library(tidyr)

data %>% 
  group_by(School_ID) %>% 
  gather(
    `Question ID`,
    Score,
    -School_ID
  ) %>% 
  group_by(School_ID, `Question ID`) %>% 
  summarise(
    Proportion = mean(Score)
  )

Explanation: first I flatten the matrix so after the gather line, you will have a table with three columns: School_ID, Question ID and Score. Then I group by the school id and question id and apply the transformation.

Bonus: you add a "spread" statement to transpose your data again, so that you have each question in its own column (see the last three lines below)

data %>% 
  group_by(School_ID) %>% 
  gather(
    `Question ID`,
    Score,
    -School_ID
  ) %>% 
  group_by(School_ID, `Question ID`) %>% 
  summarise(
    Proportion = mean(Score)
  ) %>% 
  spread(
    `Question ID`,
    Proportion
  )

Each line is a student, and schools ID of some students are the same, in your test, each line has a unique School ID. — paimfp, Feb 21 '19 at 15:38
I believe that it's irrelevant to what you are trying to achieve here, isn't it? — asosnovsky, Feb 23 '19 at 00:00

Get info per item with students grouped by schools in R

1 Answers1