0

I'm looking to run a loop to create a binary variable that identifies if two statements are true. In this case, I'm looking to identify if the defensive team is also the home team (ie: the stadium is the defensive team's home stadium).

Offensive Team       Defensive Team         Stadium                   
Dodgers              Yankees                Yankee Stadium
Red Sox              Dodgers                Dodger Stadium
Cubs                 Astros                 Wrigley Field
Yankees              Dodgers.               Yankee Stadium

Thus, I want my final dataframe to look like this.

Offensive Team       Defensive Team         Stadium                Defense_Home          
Dodgers              Yankees                Yankee Stadium         1
Red Sox              Dodgers                Dodger Stadium         1
Cubs                 Astros                 Wrigley Field          0
Yankees              Dodgers                Yankee Stadium         0

I understand of course that I will have to write out a full list of stadiums and which teams they correspond to--just looking for a template for how to code it in. I also understand that this is a pretty beginner-level question, as I am not that good at writing loops in R--doing the same thing in STATA would be much easier for me. Still learning.

Thanks in advance!

887
  • 599
  • 3
  • 15
  • Don't use a loop. Use a single `ifelse()` statement, but you need to create a separate data frame linking teams to stadiums. If you want to provide data here, use `dput()`. – dcarlson Jul 14 '20 at 03:12
  • `df$defense_home <- stadiums[df$defensive_team] == df$stadium` where `stadiums` is a named vector like `stadiums <- c(Cubs = 'Wrigley Field', Yankees = 'Yankee Stadium', ...)` – alistaire Jul 14 '20 at 03:18

1 Answers1

0

You can create a dataframe with team name and it's corresponding home stadium.

lookup_data <- data.frame(Team = c('Yankees', 'Dodgers'), 
                          home_stadium = c('YankeeStadium', 'DodgerStadium'))

You can then merge this data with original dataframe by Team name and add 1 if Stadium and home_stadium are same.

transform(merge(df, lookup_data, by.x = 'DefensiveTeam', by.y = 'Team', 
    all.x = TRUE), Defense_Home = as.integer(Stadium == home_stadium))

You can also do this with dplyr :

library(dplyr)
df %>%
  left_join(lookup_data, by = c('DefensiveTeam' = 'Team')) %>%
   mutate(Defense_Home  = as.integer(Stadium == home_stadium))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Hi Ronak, thank you again for your help. I tried the transform command to combine the new dataframe with the existing datafframe. But I seem to be getting somewhat randomly assigned values in the Defense_Home column. I went back and double checked the new dataframe that I created that matches stadium name to Team, and it looks like I defined each stadium correctly, but when I merge that to the original dataframe I'm getting screwy results. Any idea where I may be going wrong? – 887 Jul 14 '20 at 15:00
  • What results do you get? Perhaps, add `stringsAsFactors = FALSE` while creating the csv above and reading the data in R. – Ronak Shah Jul 14 '20 at 15:10
  • After re-examining, I realized that the issue was only with one team, whose stadium name was "National's Park". When I typed out the code for it in my word document, I inadvertently used a different kind of apostrophe within the name, which led to a mismatch between the two stadium columns, and caused Defense_Home to read as 0 for all values for that team. In short, the problem was my own fault, your code worked just fine. Thank you again! – 887 Jul 14 '20 at 15:48