Suppose I have data of the following type:
df <- data.frame(student = c("S1", "S2", "S3", "S4", "S5", "S2", "S6", "S1", "S7", "S8"),
factor = c("A", "A", "A", "A", "A", "B", "B", "C", "C", "D"),
year = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2),
count1 = c(0, 1, 0, 0, 0, 1, 0, 0, 0, 0),
count2 = c(1, 0, 0, 0, 0, 0, 0, 1, 0, 0))
I need a more efficient way than typical apply() functions to analyze the the two columns for student and class in a given year. When a student maintains the same factor-level in a given year, the function returns a count of zero. When a student is in more than one factor-level in a given year, the count is updated i+1 for each instance of the student in a separate factor-level.
I would like a separate count/functionality to analyze students in the data set across years. For instance, a student that maintains the same factor-level across years receives a count of zero. If a student is found in separate years to have separate factor-levels the count is updated i+1 for each instance.
There are over 10k observations, so my attempts at *apply have been unproductive. Namely, I have been able to count unique instances of each student & factor BUT only the first unique instance not all unique instances of a student (unique id) and factor. Individuals may be repeated either within or across years.
The ideal output is as follows:
Student1,Factor.Count(Within Year),Factor.Count(Between Year)