-1

I have an R data.frame of college football data, with two entries for each game (one for each team, with stats and whatnot). I would like to compare points from these to create a binary Win/Loss variable, but I have no idea how (I'm not very experienced with R). Is there a way I can iterate through the columns and try to match them up against another column (I have a game ID variable, so I'd match on that) and create aforementioned binary Win/Loss variable by comparing points values?

Excerpt of dataframe (many variables left out):

Team Code  Name      Game Code            Date          Site    Points
5         Akron      5050320051201     12/1/2005        NEUTRAL   32
5         Akron     404000520051226    12/26/2005       NEUTRAL   23
8         Alabama   419000820050903    9/3/2005         TEAM      37
8         Alabama   664000820050910    9/10/2005        TEAM      43

What I want is to append a new column, a binary variable that's assigned 1 or 0 based on if the team won or lost. To figure this out, I need to take the game code, say 5050320051201, find the other row with that same game code (there's only one other row with that same game code, for the other team in that game), and compare the points value for the two, and use that to assign the 1 or 0 for the Win/Loss variable.

John
  • 83
  • 1
  • 2
  • 12
  • @John: you should add an example where there are exactly two teams for each unique Game Code. – aichao Sep 06 '16 at 21:26

2 Answers2

1

Assuming that your data has exactly two teams for each unique Game Code and there are no tie games as given by the following example:

df <- structure(list(`Team Code` = c(5L, 6L, 5L, 5L, 8L, 9L, 9L, 8L
), Name = c("Akron", "St. Joseph", "Akron", "Miami(Ohio)", "Alabama", 
"Florida", "Tennessee", "Alabama"), `Game Code` = structure(c(1L, 
1L, 2L, 2L, 3L, 3L, 4L, 4L), .Label = c("5050320051201", "404000520051226", 
"419000820050903", "664000820050910"), class = "factor"), Date = structure(c(13118, 
13118, 13143, 13143, 13029, 13029, 13036, 13036), class = "Date"), 
Site = c("NEUTRAL", "NEUTRAL", "NEUTRAL", "NEUTRAL", "TEAM", 
"AWAY", "AWAY", "TEAM"), Points = c(32L, 25L, 23L, 42L, 37L, 
45L, 42L, 43L)), .Names = c("Team Code", "Name", "Game Code", 
"Date", "Site", "Points"), row.names = c(NA, -8L), class = "data.frame")

print(df)
##  Team Code        Name       Game Code       Date    Site Points
##1         5       Akron   5050320051201 2005-12-01 NEUTRAL     32
##2         6  St. Joseph   5050320051201 2005-12-01 NEUTRAL     25
##3         5       Akron 404000520051226 2005-12-26 NEUTRAL     23
##4         5 Miami(Ohio) 404000520051226 2005-12-26 NEUTRAL     42
##5         8     Alabama 419000820050903 2005-09-03    TEAM     37
##6         9     Florida 419000820050903 2005-09-03    AWAY     45
##7         9   Tennessee 664000820050910 2005-09-10    AWAY     42
##8         8     Alabama 664000820050910 2005-09-10    TEAM     43

You can use dplyr to generate what you want:

library(dplyr)
result <- df %>% group_by(`Game Code`) %>% 
                 mutate(`Win/Loss`=if(first(Points) > last(Points)) as.integer(c(1,0)) else as.integer(c(0,1)))
print(result)
##Source: local data frame [8 x 7]
##Groups: Game Code [4]
##
##  Team Code        Name       Game Code       Date    Site Points Win/Loss
##      <int>       <chr>          <fctr>     <date>   <chr>  <int>    <int>
##1         5       Akron   5050320051201 2005-12-01 NEUTRAL     32        1
##2         6  St. Joseph   5050320051201 2005-12-01 NEUTRAL     25        0
##3         5       Akron 404000520051226 2005-12-26 NEUTRAL     23        0
##4         5 Miami(Ohio) 404000520051226 2005-12-26 NEUTRAL     42        1
##5         8     Alabama 419000820050903 2005-09-03    TEAM     37        0
##6         9     Florida 419000820050903 2005-09-03    AWAY     45        1
##7         9   Tennessee 664000820050910 2005-09-10    AWAY     42        0
##8         8     Alabama 664000820050910 2005-09-10    TEAM     43        1

Here, we first group_by the Game Code and then use mutate to create the Win/Loss column for each group. The logic here is simply that if the first Points is greater than the last (there are only two by assumption), then we set the column to c(1,0). Otherwise, we set it to (0,1). Note that this logic does not handle ties, but can easily be extended to do so. Note also that we surround the column names with back-quotes because of special characters such as space and /.

aichao
  • 7,375
  • 3
  • 16
  • 18
0

footballdata$SomeVariable[footballdata$Wins == "1"] = stuff

call yours wins by either 1 or 0, thus binomial

R's data frames are nice in that you can aggregate what you want like, I only want the data frames with wins are 1. Then you can set the data to some variable as above. If you wanna do another data frame to populate a data frame, make sure they have the same amount of data.

footballdata$SomeVariable[footballdata$Wins == "1"][footballdata$Team == "Browns"] = Hopeful

  • I need to create the wins variable...check the question, I updated it and (hopefully) did a better job at explaining my situation. – John Sep 06 '16 at 20:35