-1

I have set up a logistic regression model in R and successfully plotted the points of the model to show a relationship in the dataset. I am having trouble showing the line graph of the prediction. The model predicts readmission rates of a hospital based on the length of the initial stay (in days). Here is my code:

mydata <- read.csv(file = 'C:\\Users\\nickg\\Downloads\\3kfid8emf9rkc9ek30sf\\medical_clean.csv', header=TRUE)[,c("Initial_days","ReAdmis")]
head(mydata)
mydata$ReAdmis.f <- factor(mydata$ReAdmis)
logfit <- glm(mydata$ReAdmis.f ~ mydata$Initial_days, data = mydata, family = binomial)
summary(logfit)
range(mydata$Initial_days)
xweight <- seq(0, 79.992, .008)
yweight <- predict(logfit, list(xweight),  type = "response")
plot(mydata$Initial_days, mydata$ReAdmis.f, pch = 16, xlab = "Initial Days", ylab = "ReAdmission Y/N")
lines(xweight, yweight)

As you can see I have the model set up and ranges described by xweight and yweight, but nothing shows up for the line.

GrumpyWizard
  • 21
  • 1
  • 1
  • 6
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Maybe try `lines(xweight, yweight + 1)`. Hard to say without being able to run and test to see what's going on. – MrFlick Apr 07 '21 at 03:59

1 Answers1

1

Always use curve for this:

plot(ReAdmis.f ~ Initial_days, data = mydata, 
     pch = 16, xlab = "Initial Days", ylab = "ReAdmission Y/N")

curve(predict(logfit, newdata = data.frame(Initial_days = x), 
#x is created by the curve function based on the plot's x limits 
#note that newdata must contain the x variable with exactly the same name as in the original data
              type = "response"),
      add = TRUE)

However, the issue here could be that your y variable is a factor variable (internally that's values of 1 and 2 if you have two levels) whereas logistic regression predictions are always in the interval [0, 1]. You should convert ReAdmis.f into 0/1 integer values before running the code.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • Thank you for your answer, and I went ahead and changed the values of Readmis in my file to ones and zeros to represent yes and no and then dropped the "factor" function. When I attempt your solution I get this error: `Error in curve(predict(logfit, newdata = data.frame(Initial_days = x), : 'expr' did not evaluate to an object of length 'n' In addition: Warning message: 'newdata' had 101 rows but variables found have 10000 rows` – GrumpyWizard Apr 09 '21 at 01:00
  • See the second line of the comment in my answer. In order for this to work you cannot use `$` in model formulas. Since you also don't need to use `$` in model formulas: `logfit <- glm(ReAdmis.f ~ Initial_days, data = mydata, family = binomial)` – Roland Apr 09 '21 at 05:27