I am using NBA shot data and am attempting to create shot prediction models using different regression techniques. However, I am running into the following warning message when trying to use a logistic regression model: Warning message: glm.fit: algorithm did not converge. Also, it seems that the predictions do not work at all (not changed from the original Y variable (make or miss)). I will provide my code below. I got the data from here: Shot Data.
nba_shots <- read.csv("shot_logs.csv")
library(dplyr)
library(ggplot2)
library(data.table)
library("caTools")
library(glmnet)
library(caret)
nba_shots_clean <- data.frame("game_id" = nba_shots$GAME_ID, "location" =
nba_shots$LOCATION, "shot_number" = nba_shots$SHOT_NUMBER,
"closest_defender" = nba_shots$CLOSEST_DEFENDER,
"defender_distance" = nba_shots$CLOSE_DEF_DIST, "points" = nba_shots$PTS,
"player_name" = nba_shots$player_name, "dribbles" = nba_shots$DRIBBLES,
"shot_clock" = nba_shots$SHOT_CLOCK, "quarter" = nba_shots$PERIOD,
"touch_time" = nba_shots$TOUCH_TIME, "game_result" = nba_shots$W
, "FGM" = nba_shots$FGM)
mean(nba_shots_clean$shot_clock) # NA
# this gave NA return which means that there are NAs in this column that we
# need to clean up
# if the shot clock was NA I assume that this means it was the end of a
# quarter and the shot clock was off.
# For now I'm going to just set all of these NAs equal to zero, so all zeros
# mean it is the end of a quarter
# checking the amount of NAs
last_shots <- nba_shots_clean[is.na(nba_shots_clean$shot_clock),]
nrow(last_shots) # this tells me there is 5567 shots taken when the shot
# clock was turned off at the end of a quarter
# setting these NAs equal to zero
nba_shots_clean[is.na(nba_shots_clean)] <- 0
# checking to see if it worked
nrow(nba_shots_clean[is.na(nba_shots_clean$shot_clock),]) # it worked
# create a test and train set
split = sample.split(nba_shots_clean, SplitRatio=0.75)
nbaTrain = subset(nba_shots_clean, split==TRUE)
nbaTest = subset(nba_shots_clean, split==FALSE)
# logistic regression
nbaLogitModel <- glm(FGM ~ location + shot_number + defender_distance +
points + dribbles + shot_clock + quarter + touch_time, data=nbaTrain,
family="binomial", na.action = na.omit)
nbaPredict = predict(nbaLogitModel, newdata=nbaTest, type="response")
cm = table(nbaTest$FGM, nbaPredict > 0.5)
print(cm)
This gives me the output of the following, which tells me the prediction didn't do anything, as it's the same as before.
FALSE TRUE
0 21428 0
1 0 17977
I would really appreciate any guidance.