1

I have a data set that looks like

    "","OBSERV","DIOX","logDIOX","OXYGEN","LOAD","PRSEK","PLANT","TIME","LAB"
"1",1011,984.06650389,6.89169348002254,"L","H","L","RENO_N","1","KK"
"2",1022,1790.7973641,7.49041625445373,"H","H","L","RENO_N","1","USA"
"3",1031,661.95870145,6.4952031694744,"L","H","H","RENO_N","1","USA"
"4",1042,978.06853583,6.88557974511529,"H","H","H","RENO_N","1","KK"
"5",1051,270.92290942,5.60183431332639,"N","N","N","RENO_N","1","USA"
"6",1062,402.98269729,5.99889362626069,"N","N","N","RENO_N","1","USA"
"7",1071,321.71945701,5.77367991426247,"H","L","L","RENO_N","1","KK"
"8",1082,223.15260359,5.40785585845064,"L","L","L","RENO_N","1","USA"
"9",1091,246.65350151,5.507984523849,"H","L","H","RENO_N","1","USA"
"10",1102,188.48323034,5.23900903921703,"L","L","H","RENO_N","1","KK"
"11",1141,267.34994025,5.58855843790491,"N","N","N","RENO_N","1","KK"
"12",1152,452.10355987,6.11391126834609,"N","N","N","RENO_N","1","KK"
"13",2011,2569.6672555,7.85153169693888,"N","N","N","KARA","1","USA"
"14",2021,604.79620572,6.40489155123453,"N","N","N","KARA","1","KK"
"15",2031,2610.4804449,7.86728956188212,"L","H",NA,"KARA","1","KK"
"16",2032,3789.7097503,8.24004471210954,"L","H",NA,"KARA","1","USA"
"17",2052,338.97054188,5.82591320649553,"L","L","L","KARA","1","KK"
"18",2061,391.09027375,5.96893841249289,"H","L","H","KARA","1","USA"
"19",2092,410.04420258,6.01626496505788,"N","N","N","KARA","1","USA"
"20",2102,313.51882368,5.74785940190679,"N","N","N","KARA","1","KK"
"21",2112,1242.5931417,7.12495571830002,"H","H","H","KARA","1","KK"
"22",2122,1751.4827969,7.46821802066524,"H","H","L","KARA","1","USA"
"23",3011,60.48026048,4.10231703874031,"N","N","N","RENO_S","1","KK"
"24",3012,257.27729731,5.55015448107691,"N","N","N","RENO_S","1","USA"
"25",3021,46.74282552,3.84466077914493,"N","N","N","RENO_S","1","KK"
"26",3022,73.605375516,4.29871805996994,"N","N","N","RENO_S","1","KK"
"27",3031,108.25433812,4.68448344109116,"H","H","L","RENO_S","1","KK"
"28",3032,124.40704234,4.82355878915293,"H","H","L","RENO_S","1","USA"
"29",3042,123.66859296,4.81760535031397,"L","H","L","RENO_S","1","KK"
"30",3051,170.05332632,5.13611207209694,"N","N","N","RENO_S","1","USA"
"31",3052,95.868704018,4.56297958887925,"N","N","N","RENO_S","1","KK"
"32",3061,202.69261215,5.31169060558111,"N","N","N","RENO_S","1","USA"
"33",3062,70.686307069,4.25825187761015,"N","N","N","RENO_S","1","USA"
"34",3071,52.034715526,3.95191110210073,"L","H","H","RENO_S","1","KK"
"35",3072,93.33525462,4.53619789950355,"L","H","H","RENO_S","1","USA"
"36",3081,121.47464906,4.79970559129829,"H","H","H","RENO_S","1","USA"
"37",3082,94.833869239,4.55212661590867,"H","H","H","RENO_S","1","KK"
"38",3091,68.624596439,4.22865101914209,"H","L","L","RENO_S","1","USA"
"39",3092,64.837097371,4.17187792984139,"H","L","L","RENO_S","1","KK"
"40",3101,32.351569811,3.47666254561192,"L","L","L","RENO_S","1","KK"
"41",3102,29.285124102,3.37707967726539,"L","L","L","RENO_S","1","USA"
"42",3111,31.36974463,3.44584388158928,"L","L","H","RENO_S","1","USA"
"43",3112,28.127853881,3.33676032670116,"L","L","H","RENO_S","1","KK"
"44",3121,91.825330102,4.51988818660262,"H","L","H","RENO_S","1","KK"
"45",3122,136.4559307,4.91600171048243,"H","L","H","RENO_S","1","USA"
"46",4011,126.11889968,4.83722511024933,"H","L","H","RENO_N","2","KK"
"47",4022,76.520259821,4.33755554003153,"L","L","L","RENO_N","2","KK"
"48",4032,93.551979795,4.53851721545715,"L","L","H","RENO_N","2","USA"
"49",4041,207.09703422,5.33318744777751,"H","L","L","RENO_N","2","USA"
"50",4052,383.44185307,5.94918798759058,"N","N","N","RENO_N","2","USA"
"51",4061,156.79345897,5.05492939129363,"N","N","N","RENO_N","2","USA"
"52",4071,322.72413197,5.77679787769979,"L","H","L","RENO_N","2","USA"
"53",4082,554.05710342,6.31726775620079,"H","H","H","RENO_N","2","USA"
"54",4091,122.55552697,4.80856420867156,"N","N","N","RENO_N","2","KK"
"55",4102,112.70050456,4.72473389805434,"N","N","N","RENO_N","2","KK"
"56",4111,94.245481423,4.54590288271731,"L","H","H","RENO_N","2","KK"
"57",4122,323.16498582,5.77816298482521,"H","H","L","RENO_N","2","KK"

I define a linear model in R using lm as

lm1 <- lm(logDIOX ~ 1 + OXYGEN + LOAD + PLANT + TIME + LAB, data=data)

and I want to interpret the estimated coefficients. However, when I extract the coefficients I get multiple 'NAs' (I'm assuming it's due to linear dependencies among the variables). How can I then interpret the coefficients? I only have one intercept that somehow represents one of the levels of each of the included factors in the model. Is it possible to get an estimate for each factor level?

> summary(lm1)

Coefficients:

    Call:
lm(formula = logDIOX ~ OXYGEN + LOAD + PLANT + TIME + LAB, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.90821 -0.32102 -0.08993  0.27311  0.97758 

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   7.2983     0.2110  34.596  < 2e-16 ***
OXYGENL      -0.4086     0.1669  -2.449 0.017953 *  
OXYGENN      -0.7567     0.1802  -4.199 0.000113 ***
LOADL        -1.0645     0.1675  -6.357 6.58e-08 ***
LOADN             NA         NA      NA       NA    
PLANTRENO_N  -0.6636     0.2174  -3.052 0.003664 ** 
PLANTRENO_S  -2.3452     0.1929 -12.158  < 2e-16 ***
TIME2        -0.9160     0.2065  -4.436 5.18e-05 ***
LABUSA        0.3829     0.1344   2.849 0.006392 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5058 on 49 degrees of freedom
Multiple R-squared:  0.8391,    Adjusted R-squared:  0.8161 
F-statistic:  36.5 on 7 and 49 DF,  p-value: < 2.2e-16
lala_12
  • 131
  • 2
  • 10

1 Answers1

0

For the NA part of your question you can have a look here:

[linear regression "NA" estimate just for last coefficient, actually of your variables can be described as a linear combination of the rest.

For the factors and their levels the way r works is showing intercept with first factor level and shows the difference of the intercept with the rest of the factors. I think it will be more clear with just one factor regression:

lm1 <- lm(logDIOX ~ 1 + OXYGEN , data=df)
> summary(lm1)

Call:
lm(formula = logDIOX ~ 1 + OXYGEN, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.7803 -0.7833 -0.2027  0.6597  3.1229 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5.5359     0.2726  20.305   <2e-16 ***
OXYGENL      -0.4188     0.3909  -1.071    0.289    
OXYGENN      -0.1896     0.3807  -0.498    0.621    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.188 on 54 degrees of freedom
Multiple R-squared:  0.02085,   Adjusted R-squared:  -0.01542 
F-statistic: 0.5749 on 2 and 54 DF,  p-value: 0.5662

what this result is saying is that for OXYGEN="H" intercept is 5.5359, for OXYGEN="L" intercept is 5.5359-0.4188=5.1171 and for OXYGEN="N" intercept is 5.5359-0.1896= 5.3463.

Hope this helps

UPDATE:

Following your comment I generalize to your model.

when

OXYGEN = "H"
LOAD = "H"
PLANT= "KARRA"
TIME=1
LAB="KK"

then:

logDIOX =7.2983

when

 OXYGEN = "L"
    LOAD = "H"
    PLANT= "KARRA"
    TIME=1
    LAB="KK"

then:

logDIOX =7.2983-0.4086 =6.8897

when

 OXYGEN = "L"
    LOAD = "L"
    PLANT= "KARRA"
    TIME=1
    LAB="KK"

then:

logDIOX =7.2983-0.4086-1.0645 =5.8252

etc.

Antonios
  • 1,919
  • 1
  • 11
  • 18
  • Thank you! However, I am interested in the case where the model includes more than just one factor. Is the estimate of the intercept in such cases, a combination of the first factor level for each of the respective factors? – lala_12 Mar 08 '18 at 18:47