I have current season data on NBA players for over 200 predictors (stats) for each player. up to this point in this season only (e.g., points per game on average for this season). I'd like to run a stepwise regression model by team or player, given that the predictor set with the most explanatory value will vary by team or player. Ultimately, I would like to then predict future player performance, so I would like to be able to access the model components (i.e., the coefficients, r-squared, etc.). What would be the best way to automate stepwise regression by group (be it player or team)? The variables (outcome and predictors) are all continuous, so it is essentially just lm models but I am trying to avoid writing the model for every team if there is a way to loop through teams or use group by or some similar function. To do a stepwise regression for each of 3 teams, I could manually do this for each team:
df<-data.frame(team=c(sample(1:3, 100, replace = TRUE)),
x1=c(rnorm(100,mean=0,sd=1)),
x2=c(rnorm(100,mean=0,sd=1)),
y=c(rnorm(100,mean=0,sd=1)))
model1_empty<-lm(y~1,data=subset(df,team==1))
model1_full<-lm(y~. - team,data=subset(df,team==1))
model1_step<-step(model1_empty, scope = list(lower = model1_empty, upper = model1_full), direction = "forward")
model2_empty<-lm(y~1,data=subset(df,team==2))
model2_full<-lm(y~. - team,data=subset(df,team==2))
model2_step<-step(model2_empty, scope = list(lower = model2_empty, upper = model2_full), direction = "forward")
model3_empty<-lm(y~1,data=subset(df,team==3))
model3_full<-lm(y~. - team,data=subset(df,team==3))
model3_step<-step(model3_empty, scope = list(lower = model3_empty, upper = model3_full), direction = "forward")
I am curious about whether there is more of an automated way to do this for 30 teams, or for 200 players.