0

I have a dataset of NBA players and their performance statistics such as games played, points per season, etc. I have multiple seasons for each player and a column Season.End, which indicates the year in which the data is from. The column G indicates the games played for that particular season.

   Season.End Player Pos     Age Tm        G    GS    MP   PER OWS     DWS WS   
        <int> <chr>  <fct> <int> <fct> <int> <int> <int> <dbl> <chr> <dbl> <chr>
 1       2001 A.C. ~ PF       37 MIA      82     1  1411  11.2 1.1     2.1 3.2  
 2       2002 A.J. ~ PG       23 CHI      45     6   607  11.8 0       0   0.1  
 3       2001 A.J. ~ PG       22 CHI      33     8   630  10.3 0.4    -0.1 0.3  
 4       2003 A.J. ~ PG       24 GSW       2     0     9  -9   -0.1    0   -0.1 
 5       2017 A.J. ~ C        24 DAL      22     0   163   8.4 -0.2    0.2 0    
 6       2013 A.J. ~ PG       26 WAS      57    22  1278  12.4 1       1.2 2.2  
 7       2010 A.J. ~ PG       23 IND      56     2   865  14   0.4     0.8 1.2  
 8       2011 A.J. ~ PG       24 IND      50     0   795  10.7 -0.4    0.7 0.3  
 9       2012 A.J. ~ PG       25 IND      44     1   568  11.5 0.2     0.5 0.7  
10       2014 A.J. ~ SG       27 MIN      28     0    99   9.7 -0.1    0   0  

I would like to add a column that sums up the number of games that a player has played in previous seasons, so before the season. Ideally, it would look like something like this:

   Season.End Player Pos     Age Tm        G    total_games
         2001 A.J.   PG      22  CHI       33           0   
         2002 A.J.   PG      23  CHI       45           33
         2003 A.J.   PG      24  GSW       2            78  
         2014 B.K.   C       27  HOU       50           0
         2015 B.K.   C       28  HOU       74           50
         2016 B.K.   C       29  HOU       66           124

I tried multiple variations of using dplyr function such as group_by, summarise, etc. and looked into transforming my Season.End column into a date format, but I never managed to get the desired results.

Does anybody have a solution for my problem?

1 Answers1

0

You first group the data by Player and then sort by Season using arrange. The new column containing the running sum of games is calculated with cumsum and added to the record with mutate. c(0,head(cumsum(G),-1) ) is used to show 0 for a player's first year and then, using head(cumsum(G) ,-1) follow it by the running sum of games except the last one.

I've modified the sample input data to make it look more like your sample output. The code looks like:

  record <- tribble(
   ~Season_End, ~Player,  ~Pos, ~Age, ~G,    
     2001,      "A.C.",   "PF",       37,       82,     
     2002,       "A.J.",   "PG",       23,       45,    
     2001,       "A.J.",    "PG",       22,       33,    
     2003,       "A.J.",   "PG",       24,       2,      
     2017,       "B.K.",   "C",        24,       22,        
     2013,       "B.K.",   "PG",       26,       57,    
     2010,       "B.K.",   "PG",       23,       56 )   

record <- record %>% group_by(Player) %>% 
          arrange( Season_End) %>%
          mutate(total_games = c(0,head(cumsum(G),-1) ))

which gives the result

Season_End Player Pos     Age     G total_games
       <dbl> <chr>  <chr> <dbl> <dbl>       <dbl>
1       2001 A.C.   PF       37    82           0
2       2001 A.J.   PG       22    33           0
3       2002 A.J.   PG       23    45          33
4       2003 A.J.   PG       24     2          78
5       2010 B.K.   PG       23    56           0
6       2013 B.K.   PG       26    57          56
7       2017 B.K.   C        24    22         113
WaltS
  • 5,410
  • 2
  • 18
  • 24