0

I am studying how four factors (at three levels) influence three different responses. I want to build a multiple linear regression model. All the factors are continuos except from "Pretreatment", which is categorical. When I try to obtain the estimates for the coefficients, I obtained the following:

scenedesmus$Pretreatment<-factor(scenedesmus$Pretreatment)
scenedesmus$Temperature<-factor(scenedesmus$Temperature)
scenedesmus$Time<-factor(scenedesmus$Time)
scenedesmus$Ratio<-factor(scenedesmus$Ratio)
modelpcrs<-lm(PCR~Temperature+Time+Ratio+Pretreatment,data = scenedesmus)
summary(modelpcrs)
Call:
lm(formula = PCR ~ Temperature + Time + Ratio + Pretreatment, 
    data = scenedesmus)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.048691 -0.009505  0.000000  0.009505  0.048691 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)         0.729001   0.021045  34.640 6.88e-11 ***
Temperature30       0.075300   0.017183   4.382  0.00177 ** 
Temperature40       0.009804   0.017183   0.571  0.58226    
Time1               0.043401   0.017183   2.526  0.03246 *  
Time2               0.042694   0.017183   2.485  0.03473 *  
Ratio3             -0.121626   0.017183  -7.078 5.80e-05 ***
Ratio6             -0.038341   0.017183  -2.231  0.05259 .  
PretreatmentMortar -0.048297   0.017183  -2.811  0.02035 *  
PretreatmentNone   -0.069140   0.017183  -4.024  0.00300 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02976 on 9 degrees of freedom
Multiple R-squared:  0.9178,    Adjusted R-squared:  0.8447 
F-statistic: 12.56 on 8 and 9 DF,  p-value: 0.0004748

It is strange because I was expecting a coefficient per factor, not for some factor levels (and no others). I don´t know how to do it to get the correct result. Moreover, I also would like to tune the model, for example removing the "time" as factor (because it is not statistically significant) and include an interaction (i.e:Temperature*Pretreatment).

The dataset used is:

 dput(scenedesmus)
structure(list(Temperature = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("20", 
"30", "40"), class = "factor"), Time = structure(c(1L, 1L, 2L, 
2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), levels = c("0.5", 
"1", "2"), class = "factor"), Ratio = structure(c(2L, 2L, 3L, 
3L, 1L, 1L, 3L, 3L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 3L, 3L), levels = c("12", 
"3", "6"), class = "factor"), Pretreatment = structure(c(3L, 
3L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 3L, 
3L), levels = c("Discs", "Mortar", "None"), class = "factor"), 
    PRY = c(7.10618979550317, 6.99107348052751, 9.81654489395678, 
    10.0937678454159, 15.8872899104855, 16.5147395153748, 15.6085073784574, 
    15.8904572330355, 9.85155639002801, 10.3291566375677, 9.81557388225615, 
    10.1774212169006, 12.0972576247432, 11.1350551614397, 14.7591913822601, 
    14.8846506719242, 9.47697977090569, 10.8328555963545), CRY = c(12.9913707456184, 
    13.2037056981015, 14.6223886369729, 14.4156689100426, 20.8510599220091, 
    21.1334682925674, 20.7517385553227, 20.3784601114164, 13.1903022986714, 
    12.7481614338955, 14.3799945987187, 15.1548695641213, 16.3653561008515, 
    17.3492383422838, 22.4414097199122, 22.4340213280367, 14.0895227253865, 
    16.0388931794408), PCR = c(0.546993072143667, 0.529478135939726, 
    0.671336615218633, 0.700194205929921, 0.761941597689038, 
    0.781449560798454, 0.752154203217537, 0.779767320305684, 
    0.746878742196859, 0.810246770966047, 0.682585366418063, 
    0.671561122571146, 0.739199168670324, 0.641818098394623, 
    0.657676659642481, 0.663485625438106, 0.672626032522028, 
    0.675411668071984)), row.names = c(NA, -18L), class = "data.frame")
  • 1
    Please google „linear regression with R“ or something along those lines. There are probably thousands of pages describing how factors are treated in regressions (they will be turned into dummies) and how you can add interaction terms. – deschen Mar 12 '23 at 17:53
  • 2
    A k level factor contributes k-1 columns to the model matrix. Only if k=2 is there one. Look at `model.matrix(modelpcrs)` to see the model matrix that it is being used and note that the columns are basically indicator functions of whether the corresponding level is being used, assuming default treatment contrasts, with the missing level being absorbed into the intercept. – G. Grothendieck Mar 12 '23 at 17:58
  • 1
    Your primary question is answered by the linked duplicate. If you want answers to your secondary questions ("I also would like to tune the model, for example removing the "time" as factor (because it is not statistically significant) and include an interaction (i.e:Temperature*Pretreatment)", can you explain what part of you're having trouble with? (It seems to me that it would be easy to modify the formula to exclude "Time" or add "Temperature*Pretreatment", but I've been doing this for a long time so maybe I don't understand your confusion properly ...) – Ben Bolker Mar 12 '23 at 18:06

0 Answers0