2

This is the code I wrote-

#HLR
setwd("c:/users/miria/desktop")
patient1 <-read.csv("patient1.csv")
library(readr)
Statistics <- read_csv("patient1.csv")

#step 1
#פה בעצם משווים למודל שאין בו בכלל מנבאים
step1 <- lm (Symptoms~ Meeting + Alliance,
             data= Statistics)
summary(step1)

#step 2

step2 <- lm (Symptoms~ Meeting + Alliance
             + Adherence  ,
             data= Statistics)
summary(step2)


#step 3

step3 <- lm (Symptoms~ Meeting + Alliance + Adherence 
             +Competence  ,
             data= Statistics)
summary(step3)


> dput (patient1)
structure(list(Meeting = 1:5, Competence = c(4.75, 4.44, 3.33, 
4.4, 3.8), Adherence = c(0.23, 1.65, 0.32, 1.54, 1.16), Alliance = c(12L, 
2L, 5L, 6L, 7L), Symptoms = c(37L, 46L, 47L, 48L, 40L)), class = "data.frame", row.names = c(NA, 
-5L))

> dput (Statistics)
structure(list(Meeting = c(1, 2, 3, 4, 5), Competence = c(4.75, 
4.44, 3.33, 4.4, 3.8), Adherence = c(0.23, 1.65, 0.32, 1.54, 
1.16), Alliance = c(12, 2, 5, 6, 7), Symptoms = c(37, 46, 47, 
48, 40)), row.names = c(NA, -5L), spec = structure(list(cols = list(
    Meeting = structure(list(), class = c("collector_double", 
    "collector")), Competence = structure(list(), class = c("collector_double", 
    "collector")), Adherence = structure(list(), class = c("collector_double", 
    "collector")), Alliance = structure(list(), class = c("collector_double", 
    "collector")), Symptoms = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = ","), class = "col_spec"), problems = <pointer: 0x000002e762a9a090>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

In step 3 I expected to get a response like I got after step 2, but instead in many of the columns apeared NaN. When taking off adherence/ competence it worked fine, and only when trying all 4 at the same time, again I ended up with many NaN.

I tried using less predictors- When taking off adherence/ competence it worked fine, and only when trying all 4 at the same time, again I ended up with many NaN.

Artem
  • 3,304
  • 3
  • 18
  • 41
miriam
  • 21
  • 2
  • 1
    Can you make your post [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and provide your datasets using `dput(patient)` and `dput(Statistics1)`? – jrcalabrese Dec 12 '22 at 16:43

1 Answers1

1

NaNs are generated because of degrees of freedom equals 0, i.e. the number of observation in Step 3 equals to numbers of variables plus one response (number of columns of data frame in lm are equal to the number of rows).

Geometrical interpretation: you are drawing 5D hyperplane through 5 points, it can be done only in one way. No place for uncertainty, error or p-value (so somewhere in the code zero is devided by zero hence NaNs are produced).

Solution for the problem is either reduce number of regressors (you done it in previous steps) or to increase number of observations (rows in your tabls).

See the simple example below for 2D case:

df <- data.frame(x = 1:2, y = 3:4)
summary(lm(y ~ x, data = df))

Output:

Call:
lm(formula = y ~ x, data = df)

Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)        2        NaN     NaN      NaN
x                  1        NaN     NaN      NaN

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 1 and 0 DF,  p-value: NA
Artem
  • 3,304
  • 3
  • 18
  • 41