0

this is my first question on this forum.

I would like to re-model the structure of my dataset. I would like to split the column "Teams" into two columns. One with the hometeam and another with the awayteam. I also would like to split the result into two columns. Homegoals and Awaygoals. The new columns should not have a zero infront of the "real" goals scored.

BEFORE

Date        Time    Teams               Results Homewin Draw    Awaywin
18 May      19:45   AC Milan - Sassuolo 02:01   1.26    6.22    10.47
18 May      19:45   Chievo - Inter      02:01   3.73    3.42    2.05
18 May      19:45   Fiorentina - Torino 02:02   2.84    3.58    2.39

AFTER

Date Time Hometeam Awayteam Homegoals Awaygoals Homewin Draw    Awaywin
18 May  19:45   AC Milan    Sassuolo    2   1   1.26    6.22    10.47
18 May  19:45   Chievo      Inter       2   1   3.73    3.42    2.05
18 May  19:45   Fiorentina  Torino      2   2   2.84    3.58    2.39

Can R fix this problem for me? Which packages do i need? I want to be able to do this for many excel spreadsheets with different leagues and divisions but all with the same structure.

Can someone help me and my data.frame?

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
Developer
  • 917
  • 2
  • 9
  • 25

2 Answers2

1

tidyr solution:

separate(your.data.frame, Teams, c('Home', 'Away'), sep = " - ")

Base R solution (following this answer):

df <- data.frame(do.call(rbind, strsplit(as.character(your.df$teams), " - ")))
names(df) <- c("Home", "Away")
Community
  • 1
  • 1
Richard Border
  • 3,209
  • 16
  • 30
  • 1
    Base R solution. This one works great too. I prefer this one because of its simplicity without any packages. Thank you! – Developer Apr 10 '15 at 14:52
1

Here's an approach that uses cSplit from the splitstackshape package, which uses and returns a data.table. Presuming your original data frame is named df,

library(splitstackshape)

setnames(
    cSplit(df, 3:4, c(" - ", ":"))[, c(1:2, 6:9, 3:5), with = FALSE], 
    3:6, 
    paste0(c("Home", "Away"), rep(c("Team", "Goals"), each = 2))
)[]
#      Date  Time   HomeTeam AwayTeam HomeGoals AwayGoals Homewin Draw Awaywin
# 1: 18 May 19:45   AC Milan Sassuolo         2         1    1.26 6.22   10.47
# 2: 18 May 19:45     Chievo    Inter         2         1    3.73 3.42    2.05
# 3: 18 May 19:45 Fiorentina   Torino         2         2    2.84 3.58    2.39
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245