0

I have data on several thousand US basketball players over multiple years.

Each basketball player has a unique ID. It is known for what team and on which position they play in a given year, much like the mock data df below:

df <- data.frame(id = c(rep(1:4, times=2), 1), 
             year = c(1, 1, 2, 2, 3, 4, 4, 4,5),
             team = c(1,2,3,4, 2,2,4,4,2),
             position = c(1,2,3,4,1,1,4,4,4))
> df
  id year team position
1  1    1    1        1
2  2    1    2        2
3  3    2    3        3
4  4    2    4        4
5  1    3    2        1
6  2    4    2        1
7  3    4    4        4
8  4    4    4        4
9  1    5    2        4

What is an efficient way to manipulate df into new_df below?

> new_df
  id move time position.1 position.2 year.1 year.2
1  1    0    2          1          1      1      3
2  2    1    3          2          1      1      4
3  3    0    2          3          4      2      4
4  4    1    2          4          4      2      4
5  1    0    2          1          4      3      5

In new_df the first occurrence of the basketball player is compared to the second occurrence, recorded whether the player switched teams and how long it took the player to make the switch.

Note:

  • In the real data some basketball players occur more than twice and can play for multiple teams and on multiple positions.

  • In such a case a new row in new_df is added that compares each additional occurrence of a player with only the previous occurrence.

Edit: I think this is not a rather simple reshape exercise, because of the reasons mentioned in the previous two sentences. To clarify this, I've added an additional occurrence of player ID 1 to the mock data.

Any help is most welcome and appreciated!

wake_wake
  • 1,332
  • 2
  • 19
  • 46
  • 2
    Possible duplicate of [How to reshape data from long to wide format?](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) – LAP Jul 05 '18 at 12:02
  • 1
    Related: [*Transpose / reshape dataframe without “timevar” from long to wide format*](https://stackoverflow.com/q/11322801/2204410) – Jaap Jul 05 '18 at 12:16

3 Answers3

1
s=table(df$id)
df$time=rep(1:max(s),each=length(s))
df1 = reshape(df,idvar = "id",dir="wide")
transform(df1, move=+(team.1==team.2),time=year.2-year.1)

 id year.1 team.1 position.1 year.2 team.2 position.2 move time
1  1      1      1          1      3      2          1    0    2
2  2      1      2          2      4      2          1    1    3
3  3      2      3          3      4      4          4    0    2
4  4      2      4          4      4      4          4    1    2
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • This works great for the mock data provided, but doesn't deal with the last two aspects of the questions. I've updated the question to highlight these. I've also edited the example to show more clearly what I am after. – wake_wake Jul 06 '18 at 03:46
0

The below code should help you get till the point where the data is transposed You'll have to create the move and time variables

df <- data.frame(id = rep(1:4, times=2), 
                 year = c(1, 1, 2, 2, 3, 4, 4, 4),
                 team = c(1, 2, 3, 4, 2, 2, 4, 4),
                 position = c(1, 2, 3, 4, 1, 1, 4, 4))

library(reshape2)
library(data.table)

setDT(df) #convert to data.table
df[,rno:=rank(year,ties="min"),by=.(id)] #gives the occurance

#creating the transposed dataset
Dcast_DT<-dcast(df,id~rno,value.var = c("year","team","position"))
SatZ
  • 430
  • 5
  • 14
0

This piece of code did the trick, using data.table

#transform to data.table
dt <- as.data.table(df)          

#sort on year  
setorder(dt, year, na.last=TRUE) 

#indicate the names of the new columns          
new_cols= c("time", "move", "prev_team", "prev_year", "prev_position")


#set up the new variables
dtt[ , (new_cols) := list(year - shift(year),team!= shift(team), shift(team), shift(year), shift(position)), by = id]

# select only repeating occurrences
dtt <- dtt[!is.na(dtt$time),]

#outcome
dtt
   id year team position time  move prev_team prev_year prev_position
1:  1    3    2        1    2  TRUE         1         1             1
2:  2    4    2        1    3 FALSE         2         1             2
3:  3    4    4        4    2  TRUE         3         2             3
4:  4    4    4        4    2 FALSE         4         2             4
5:  1    5    2        4    2 FALSE         2         3             1
wake_wake
  • 1,332
  • 2
  • 19
  • 46