1

I would like to go from this structure:

game_id team pts
1 400597848 TOWS  53
2 400597848 COFC  50
3 400595519  ILL  49
4 400595519  WIS  68

To this:

game_id team1 pts1 team2 pts2
1 400597848 TOWS  53  COFC  50
3 400595519  ILL  49  WIS  68

Here's sample data:

d <- structure(list(game_id = c(400597848L, 400597848L, 400595519L,
400595519L), team = c("TOWS", "COFC", "ILL", "WIS"), pts = c(53L,
50L, 49L, 68L)), .Names = c("game_id", "team", "pts"), row.names = c(NA,
4L), class = "data.frame")

I've tried using tidyr and followed this tutorial: http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/

However, when I try:

 spread(d, team, pts)

I get repeated columns for all teams but don't want all combinations.

tcash21
  • 4,880
  • 4
  • 32
  • 39

1 Answers1

4

1. data.table

We could use dcast from the devel version of data.table i.e. v1.9.5, which can take multiple 'value.var' columns. It can be installed from here.

Convert the 'data.frame' to 'data.table' (setDT(d)), create a sequence column ('ind'), grouped by 'game_id', and then use dcast on the modified dataset specifying the 'value.var' as 'team', and 'pts'.

dcast(setDT(d)[, ind:= 1:.N, by=game_id], game_id~ind, 
                   value.var=c('team', 'pts'))
#     game_id 1_team 2_team 1_pts 2_pts
#1: 400595519    ILL    WIS    49    68
#2: 400597848   TOWS   COFC    53    50

2. base R

Another option is using reshape from base R after creating the 'ind' column.

 d1 <- transform(d, ind=ave(seq_along(game_id), game_id, FUN=seq_along))

 reshape(d1, idvar='game_id', timevar='ind', direction='wide')
 #    game_id team.1 pts.1 team.2 pts.2
 #1 400597848   TOWS    53   COFC    50
 #3 400595519    ILL    49    WIS    68
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    @tcash21 No problem. I think `spread` (at present) can take only single value columns (haven't checked the new vignettes). – akrun Mar 22 '15 at 16:47