My main goal: I have a dataframe of lots of students, their responses for 45 items in a test, and the correct answer also for each of the 45 items. Some of the students are from the same school(We have an ID for each school).
What I need is to get the percentage of people who answered the item correct, within each school, and for each item.
I was able to separate the vectors, and correct the test for each student, then I have a data frame of 0's and 1's like in the picture(each line is a student).
Then I was able to get what I want for the question 1 with:
escolas <- group_by(acertos, School_ID)
percentual <- summarize(escolas, count = n(), P1 = (sum(Q1)/count)*100)
I could type 45 of those lines, changing the question reference, but I am pretty sure there's another way to do that but I could not figure this out.
Reproducible example, 20 students, 4 schools, and 5 items:
Student_ID = c(1:20)
School_ID = c(rep(1,5),rep(2,5), rep(3,5), rep(4,5))
Q1 = 1*(runif(20) < 0.5)
Q2 = 1*(runif(20) < 0.5)
Q3 = 1*(runif(20) < 0.5)
Q4 = 1*(runif(20) < 0.5)
Q5 = 1*(runif(20) < 0.5)
data <- tibble(Student_ID, School_ID, Q1, Q2, Q3, Q4, Q5)
data
Student_ID School_ID Q1 Q2 Q3 Q4 Q5
1 1 0 1 1 0 1
2 1 0 0 1 1 0
3 1 0 1 0 0 0
4 1 0 0 0 0 1
5 1 0 1 1 1 1
6 2 0 0 1 0 1
7 2 0 0 1 1 1
8 2 1 1 1 0 0
9 2 0 0 1 0 0
10 2 1 1 1 1 1
What I wish is something like this
School_ID Q1 Q2 Q3 Q4 Q5
1 70% 50% 30% 20% 40%
2 60% 40% 20% 10% 30%
Meaning:
Considering all students from school 1(and only them),70% got Q1 right.
Considering all students from school 2(and only them), 30% got Q5 right, and so on. For all schools and all items.
I hope this can make it easier for your to try and have a better understanding of the challenge.