0

I have a dataset of unique matches like this. Each row is a match with result.

date <- c('2017/12/01','2017/11/01','2017/10/01','2017/09/01','2017/08/01','2017/07/01','2017/06/01')
team1 <- c('A','B','B','C','D','A','B')
team1_score <- c(1,0,4,3,5,6,7)
team2 <- c('B','A','A','B','C','C','A')
team2_score <- c(0,1,5,4,6,9,10)
matches <- data.frame(date, team1, team1_score, team2, team2_score)

I want to create 2 new columns, forms for team 1 and team 2. The result of the match can be determined by which team have a larger score or a draw. The result would look something like below. So the form would be the result of team1 in the last 2 matches. For example, for the first 3 rows, form of team 1 and 2 respectively are. There will be times where there are not enough 2 matches of a particular team, so a result of NULL is sufficient. I want to know the form of team1 and team2 going into a match.

  • Form1: W-W, L-W, W-L
  • Form2: L-L, W-L, L-W

In the actual data set, there are a lot more than just 4 unique teams. I have been thinking but can't think of a good way to create these 2 variables.

  • 2
    How do you know about a result? Does a score of 'B' beat a score of 1? – Gregor Thomas Dec 06 '17 at 14:59
  • 1
    Please provide a reproducible example. Look for the function dput() to help you provide a reproducible dataset. You have many options. If the data is in long (i.e. "tidy") format, you could have a rolling sum of victories and defeats, that would yield a result ina format like W|L, or according to your examples, 3|2. You could also have an if_else function creating a letter string form math results rolling against the last five iterations, and then concatenating those strings. – Nicolás Velasquez Dec 06 '17 at 15:03
  • @Gregor: Yea, the team can be in Team1 or Team2, if they have a larger score, it means they win. – Khiem Nguyen Dec 06 '17 at 15:30
  • To work on this problem, the first step is to get data. It would be really nice if you would *share data*, rather than ask anyone who works on this problem to do the work of creating fake data. You don't need to share much, but 6-10 rows of matches with about 3 teams. You'll get help even faster if your example data is copy/pasteable, either by sharing the code to simulate it or sharing the outptut of `dput()` to create its structure. [There are lots of tips on making R reproducible examples here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Gregor Thomas Dec 06 '17 at 15:41
  • @Gregor: I have adjusted my question above to make it clearer. – Khiem Nguyen Dec 06 '17 at 15:41
  • @NicolásVelásquez: i have edited my question to have a reproducible example. I am not sure about your solution to have a rolling sum. Can you explain more? – Khiem Nguyen Dec 06 '17 at 15:53

1 Answers1

0

Here is my solution:

    library(tidyverse)


    date <- as.Date(c('2017/12/01','2017/11/01','2017/10/01','2017/09/01','2017/08/01','2017/07/01','2017/06/01', '2017/05/30'))
    team1 <- c('A','B','B','C','D','A','B','A')
    team1_score <- c(1,0,4,3,5,6,7,0)
    team2 <- c('B','A','A','B','C','C','A','D')
    team2_score <- c(0,1,5,4,6,9,10,0)
    matches <- data.frame(date, team1, team1_score, team2, team2_score)

    ## 1. Create a unique identifier for each match. It assumes that teams can only play each other once a day.
    matches$UID <- paste(matches$date, matches$team1, matches$team2, sep = "-")

    ## 2. Create a Score Difference Varaible reflecting team1's score
    matches <- matches %>% mutate(score_dif_team1 = team1_score - team2_score)

    ## 3. Create a Result (WDL) reflecting team1's results
    matches <- matches %>% mutate(results_team1 = if_else(score_dif_team1 < 0, true = "L", false = if_else(score_dif_team1 > 0, true = "W", false = "D")))

    ## 4. Cosmetic step: Reorder variables for easier comparison across variables
    matches <- matches %>% select(UID, date:results_team1)

    ## 5. Reshape the table into a long format based on the teams. Each observation will now reflect the results of 1 team within a match. Each game will have two observations.
    matches <- matches %>% gather(key = old_team_var, value = team, team1, team2)

    ## 6. Stablishes a common results variable for each observation.  It essentially inverts the results_team1 varaible for teams2, and keeps results_team1 identical for teams1
    matches <- matches %>% 
                mutate(results = if_else(old_team_var == "team2", 
                                                    true = if_else(results_team1 == "W", 
                                                                   true = "L", 
                                                                   false = if_else(results_team1 == "L", 
                                                                                     true = "W",
                                                                                     false = "D")),
                                                    false = results_team1))

## Final step: Filter the matches table by the dates you are interested into, and then reshapes the table to show a data frame of DLW in long format.

    Results_table <- matches %>% filter(date <= as.Date("2017-12-01")) %>% group_by(team, results) %>% summarise(cases = n()) %>% spread(key = results, value = cases, fill = 0)

## Results:
    # A tibble: 4 x 4
    # Groups:   team [4]
       team     D     L     W
    * <chr> <dbl> <dbl> <dbl>
    1     A     1     1     4
    2     B     0     4     1
    3     C     0     1     2
    4     D     1     1     0
Nicolás Velasquez
  • 5,623
  • 11
  • 22