First post ever here, but I've been reading a lot so thanks!
I have a huge dataframes with many columns, but only 4 matter here:
dates/classes/names/grades.
For each date, i have several classes (with students), each with several people (names - always the same people in their respective classes), each one having ONE grade per date.
On the first date, I retrieve the best student per class considering his grade, using max[]
.
However, for the next dates, I want to do the following:
- If the previous best student is still in the top 3 of his class, then we consider him to still be the best one.
- Else, we consider the new 1st student to be the best one.
Hence, every date depends on the previous one.
Is it possible to do this without a loop?
I can't find out how, as every iteration depends on the previous one.
This is my code below. Apologies if it's not optimized!
Thanks a lot :)
for (i in (1:(length(horizon)-1))) #horizon is the vector of dates
{
uni3 <- dataaf[dataaf[,1] == as.numeric(horizon[i]),] #dataaf contains all the data, we only keep the date for the considered date i
if (i == 1) #we take the best student per class
{
selecdate <- data.frame() #selecdate is the dataframe containing the best people for this date
for (z in (1:15) #15 classes
{
selecsec <- na.omit(uni3[uni3[,14] == z,]) #classes are column 14
ligneselec <- max(selecsec[,13]) #grades are column 13
selecsec <- data.frame(uni3[match(ligneselec,uni3[,13]),])
selecdate <- rbind(selecdate,selecsec)
}
}
else { #we keep a student if he was in the previous top 3, else we take the best one
selecdate <- data.frame()
for (z in (1:15))
{
lastsec <- na.omit(lastdate[lastdate[,14] == z,]) #last results
#retrieving the top 3 people this date
selecsec <- na.omit(uni3[uni3[,14] == z,])
newligneselec <- tail(sort(selecsec[,13]),3)
selecsec <- data.frame(selecsec[rev(match(newligneselec,selecsec[,13])),])
if((length(match(selecsec[,3],lastsec[,3])[!is.na(match(selecsec[,3],lastsec[,3]))]) == 0))
{
ligneselec <- max(selecsec[,13])
selecsec <- data.frame(uni3[match(ligneselec,uni3[,13]),])
}
else
{
selecsec <- lastsec
}
selecdate <- rbind(selecdate,selecsec)
}
}
lastdate <- selecdate #recording the last results
}
EDIT : Here is an example.
- In date 1, John and Audrey are both selected in class 1 and 2.
- On date 2, John is still among the best 3, so he remains selected, while Audrey is only 4th so Jim (ranked 1st for the date 2) replaces her.
On date 3, John is still among the best 3, so he remains selected (no ties issues in the data I work on). Jim is now 4th, so Sandra takes his place.
structure(list(Dates = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("12/02", "13/02", "14/02" ), class = "factor"), Classes = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ), Names = structure(c(6L, 3L, 9L, 7L, 1L, 8L, 4L, 10L, 5L, 2L, 6L, 3L, 9L, 7L, 1L, 8L, 4L, 10L, 5L, 2L, 6L, 3L, 9L, 7L, 1L, 8L, 4L, 10L, 5L, 2L), .Label = c("Ashley", "Audrey", "Bob", "Denis", "Jim", "John", "Kim", "Sandra", "Terry", "Tim"), class = "factor"), Grades = c(10, 5, 3, 2, 1, 3, 4, 5, 6, 7, 8, 2, 10, 9, 1, 7, 5, 1, 8, 2, 5, 1, 4, 8, 8, 7, 6, 5, 4, 3)), .Names = c("Dates", "Classes", "Names", "Grades"), row.names = c(NA, -30L), class = "data.frame")