0

I looked around the site, but I could not find this specific issue. I'm trying to prepare my dataset for further analysis, but I can't seem to fix something.

I have a list of Players and the club they currently play at:

PlayerID PlayerName        CurrentClub
1        Roland Alberg     ADO Den Haag
2        John Goossens     Feyenoord
3        Michael de Leeuw  Feyenoord
4        Kenny van der Weg NAC Breda
5        Alex Schalk       NAC Breda

Where I want to get is:

NewID CurrentClub       Player1             Player2

1     ADO Den Haag      Roland Alberg       NA
2     Feyenoord         John Goossens       Michael de Leeuw
3     NAC Breda         Kenny van der Weg   Alex Schalk 

I've tried various methods with melt, group_by and transpose, but I never got it this result.

Does anybody know how to do this?

BdJ
  • 47
  • 4
  • 1
    https://tidyr.tidyverse.org/reference/spread.html – GordonShumway Oct 30 '18 at 17:16
  • So I tried your suggestion and did this: Test <- spread(dataset, CurrentClub, PlayerName, fill = NA, convert = FALSE, drop = TRUE, sep = NULL) And got the following error message: Error: Duplicate identifiers for rows (144, 567, 945, 1257, 1753, 2167), (189, 680, 1026), (90, 683, 882, 1714), (91, 507, 1577, 1715), (278, 733, 1192), (7, 608), – BdJ Oct 30 '18 at 17:23
  • @camille I don't think so as this situation is a lot more complex than the other one (the other one is a simple spread, this one is not) – prosoitos Oct 30 '18 at 17:31
  • Start with this extended version `reshape(df, idvar = "CurrentClub", timevar = "PlayerID", direction = "wide")` – nghauran Oct 30 '18 at 17:49

2 Answers2

3

A combination of row_number and group_by should do the trick. Here is my solution:

df <- tibble(PlayerID = c(1,2,3,4,5),
   PlayerName = c("Alberg", "Goossens","Leeuw","van der Weg","Schalk"),
   CurrentClub = c("ADO Den Haag", "Feyenoord", "Feyenoord", "NAC Breda", "NAC Breda"))

 new_df <- df %>% group_by(CurrentClub) %>% select(-PlayerID) %>%
 mutate(player_number = paste0("Player ",row_number())) %>%
 spread(player_number, PlayerName)

 new_df
Henry Cyranka
  • 2,970
  • 1
  • 16
  • 21
  • 2
    Oh, I see you suggested basically the same solution as me. row_number is a good call. Took me a while to realize that playerID was preventing from my solution to work... :) – iod Oct 30 '18 at 17:35
  • 1
    Bingo! Thanks a bunch Harro. – BdJ Oct 30 '18 at 18:09
0
df[,-1] %>% 
group_by(CurrentClub) %>% 
mutate(Player=seq(1:n())) %>% 
spread(Player, PlayerName, sep="")
iod
  • 7,412
  • 2
  • 17
  • 36