1

First post ever here, but I've been reading a lot so thanks!

I have a huge dataframes with many columns, but only 4 matter here:

dates/classes/names/grades.

For each date, i have several classes (with students), each with several people (names - always the same people in their respective classes), each one having ONE grade per date.

On the first date, I retrieve the best student per class considering his grade, using max[].
However, for the next dates, I want to do the following:

  • If the previous best student is still in the top 3 of his class, then we consider him to still be the best one.
  • Else, we consider the new 1st student to be the best one.

Hence, every date depends on the previous one.

Is it possible to do this without a loop?
I can't find out how, as every iteration depends on the previous one.

This is my code below. Apologies if it's not optimized!

Thanks a lot :)

for (i in (1:(length(horizon)-1))) #horizon is the vector of dates
{
    uni3 <- dataaf[dataaf[,1] == as.numeric(horizon[i]),]     #dataaf contains all the data, we only keep the date for the considered date i

    if (i == 1)                             #we take the best student per class
    {
        selecdate <- data.frame()                             #selecdate is the dataframe containing the best people for this date

        for (z in (1:15)    #15 classes
        {
            selecsec <- na.omit(uni3[uni3[,14] == z,])                 #classes are column 14
            ligneselec <- max(selecsec[,13])                          #grades are column 13
            selecsec <- data.frame(uni3[match(ligneselec,uni3[,13]),])
            selecdate <- rbind(selecdate,selecsec)
        }
    } 
    else {              #we keep a student if he was in the previous top 3, else we take the best one
        selecdate <- data.frame()

        for (z in (1:15))
        {
            lastsec <- na.omit(lastdate[lastdate[,14] == z,])         #last results

            #retrieving the top 3 people this date
            selecsec <- na.omit(uni3[uni3[,14] == z,])
            newligneselec <- tail(sort(selecsec[,13]),3)
            selecsec <- data.frame(selecsec[rev(match(newligneselec,selecsec[,13])),])

            if((length(match(selecsec[,3],lastsec[,3])[!is.na(match(selecsec[,3],lastsec[,3]))]) == 0)) 
            {
                ligneselec <- max(selecsec[,13])
                selecsec <- data.frame(uni3[match(ligneselec,uni3[,13]),])
            } 
            else 
            {
                selecsec <- lastsec
            } 

            selecdate <- rbind(selecdate,selecsec)
        }
    }

    lastdate <- selecdate #recording the last results
}

EDIT : Here is an example.

  • In date 1, John and Audrey are both selected in class 1 and 2.
  • On date 2, John is still among the best 3, so he remains selected, while Audrey is only 4th so Jim (ranked 1st for the date 2) replaces her.
  • On date 3, John is still among the best 3, so he remains selected (no ties issues in the data I work on). Jim is now 4th, so Sandra takes his place.

    structure(list(Dates = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("12/02", "13/02", "14/02" ), class = "factor"), Classes = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ), Names = structure(c(6L, 3L, 9L, 7L, 1L, 8L, 4L, 10L, 5L, 2L, 6L, 3L, 9L, 7L, 1L, 8L, 4L, 10L, 5L, 2L, 6L, 3L, 9L, 7L, 1L, 8L, 4L, 10L, 5L, 2L), .Label = c("Ashley", "Audrey", "Bob", "Denis", "Jim", "John", "Kim", "Sandra", "Terry", "Tim"), class = "factor"), Grades = c(10, 5, 3, 2, 1, 3, 4, 5, 6, 7, 8, 2, 10, 9, 1, 7, 5, 1, 8, 2, 5, 1, 4, 8, 8, 7, 6, 5, 4, 3)), .Names = c("Dates", "Classes", "Names", "Grades"), row.names = c(NA, -30L), class = "data.frame")

Myrith
  • 11
  • 3

2 Answers2

0

Edited to reflect clarified request in the comments.

###---------- CREATING THE DATA (may be different from what you had in mind)
# Classes and Students
Classes <- c("U.S. History", "English", "NonLinear Optimization")
Students <- c("James", "Jamie", "John", "Jim", "Jane", "Jordan", "Jose")
df.1 <- expand.grid(Classes = Classes, Students = Students, stringsAsFactors = T)
# Generate Dates
Dates.seq <- seq(as.Date("2017/2/10"), as.Date("2017/3/27"), "days")
df.2 <- merge(Dates.seq, df.1)
# Generate Grades
grading <- c(4.0, 3.7, 3.3, 3.0, 2.7, 2.3, 2.0, 1.7)
Grades <- sample(grading, size = dim(df.2)[1], replace = T, prob = grading/sum(grading)) # smart students
df <- data.frame(df.2, Grades)
colnames(df) <- c("Dates","Classes","Students","Grades")

# Works assuming your df has the following labeled and formatted columns
str(df)
#'data.frame':  966 obs. of  4 variables:
#  $ Dates   : Date, format: "2017-02-10" "2017-02-11" "2017-02-12" ...
#  $ Classes : Factor w/ 3 levels "U.S. History",..: 1 1 1 1 1 1 1 1 1 1 ...
#  $ Students: Factor w/ 7 levels "James","Jamie",..: 1 1 1 1 1 1 1 1 1 1 ...
#  $ Grades  : num  2.3 3.3 2.3 3.3 2.7 4 4 1.7 2.3 4 ...

# No aggregateion, just splitting by classes
df.split1 <- split(df, df[,"Classes"])
# Then splitting each of those lists by Dates
df.split2 <- lapply(df.split1, function(x) split(x, x[,"Dates"]))
# double the lapply becuase now we have lists within lists
top1 <- lapply(df.split2, function(i) lapply(i, function(j) j[order(-j[,"Grades"])[1], "Students"]))
top3 <- lapply(df.split2, function(i) lapply(i, function(j) j[order(-j[,"Grades"])[1:3], "Students"]))

# Easier to read
AllClasses <- levels(df[,"Classes"])
AllDates <- unique(df[,"Dates"])

# Initialize a matrix to keep track of changes in the Top1 and Top3
superstar <- matrix(NA, nrow = length(AllDates), ncol = length(AllClasses), 
                    dimnames = list(as.character(AllDates), AllClasses))

# Looping
for(date in 1:length(AllDates)){
  for(class in AllClasses){
    if(date == 1){ 
      # First NewTop1 = First Top1 
      superstar[date, class] <- unlist(top1[[class]][date])
    } else {
      # If superstar in date-1 is in the Top3 of date now,
      if(superstar[date-1, class] %in% as.numeric(unlist(top3[[class]][date]))){
        # still superstar
        superstar[date,class] <- superstar[date-1, class]
      } else{
        # new superstar is highest scorer of date now
        superstar[date,class] <- unlist(top1[[class]][date])
      }
    }
  }
}
# painful for me trying to figure out how to convert superstar numbers to names but this worked
superstar.char <- as.data.frame(matrix(levels(df[,"Students"])[superstar], ncol = length(AllClasses)))
dimnames(superstar.char) <- dimnames(superstar)
superstar.char # superstar with Students as characters 

Let me know if you have any difficulties!

Evan Friedland
  • 3,062
  • 1
  • 11
  • 25
  • Hi and thanks a lot! I've been trying to use your code, which seems great. Unfortunately I would like not to aggregate classes (i want the best student per class), and I have some troubles adapting the rest of the code. – Myrith Mar 30 '17 at 09:10
  • Hi again! I think I gave a bad explanation about what I wanted exactly. A superstar in date i is still a superstar in date (i+1) if he's among the 3 best, but he's also still a superstar if he's among the 3 best in date (i+2). Your code doesn't seem to do this. See line 8 below, where I would have expected Jose to remain a superstar (column 1 is top 1, columns 2-4 are top 3, column 5 is the superstar). – Myrith Mar 30 '17 at 14:24
  • structure(c("James", "Jim", "Jordan", "John", "Jordan", "Jose", "Jane", "James", "James", "Jim", "Jordan", "John", "Jordan", "Jose", "Jane", "James", "Jordan", "John", "Jim", "Jose", "Jane", "Jane", "Jordan", "Jose", "Jim", "James", "Jane", "Jamie", "Jose", "James", "Jose", "Jamie", "James", "James", "Jim", "John", "Jordan", "Jose", "Jose", "James"), .Dim = c(8L, 5L), .Dimnames = list( NULL, c("", "", "", "", "superstar"))) – Myrith Mar 30 '17 at 14:27
  • He is a superstar in [i+1] ONLY if he was in the top 3 of [i+0]. Then in [i+2], if he is actually in the top 3 highest AND also was in [i+1], make him a superstar- got it I'll try again. And this is for EACH class across dates, so if each students attends 3 classes, there will be 3 superstars per date? 1 for each class? – Evan Friedland Mar 30 '17 at 16:36
  • Hi, thanks for your work, I really appreciate it! I will take the time to adapt it. Looks like there's no way to avoid a loop for this then? :) – Myrith Apr 03 '17 at 13:20
  • I've spent a while trying to figure out but have come up short each time. Thankfully, the for loop method is very legible and If the solution above ends up working for you, please upvote/mark as answered :D – Evan Friedland Apr 03 '17 at 16:15
  • Hi, looks like it does, though I try to avoid loops because I've read that it takes a lot of time in R (you can see I had a code already working with loops). But maybe this will help saving time, so I will try to adapt it :) – Myrith Apr 04 '17 at 09:32
  • Myrith if this helped you out would you accept this as your answer? :) – Evan Friedland Jun 23 '17 at 14:50
0

It is possible to solve anything you would otherwise solve in a loop with a recursive function (a function that calls itself). Since you are changing the behavior of the function depending on i you'll need to pass i as param into the function. You'll also need the function to be able to realize when it is done and return the result set.

russellpierce
  • 4,583
  • 2
  • 32
  • 44