2

I'm trying to use the glm() command in R on my dataset (data_2, see below). I use the following command:

glm(formula = GRM_KerkMeerlo ~ ((fGS*0.865 + fQS*EC_QS + fXS*EC_fXS)/Q), data = data_2)

I thought that I was doing the right thing, but when I run this input I get an error:

Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars

I've looked all over this website to find out what I did wrong but without result, so here I am, asking what on Earth I have done wrong.

I think I've made just a little mistake that can be solved easily, I just can't see what.

Thank you!

 data_2
         date P ETpot       Q   T     fXS GRM_KerkMeerlo GRM_SintJorisweg    fGS

        fQS
    1  2016032400 0 0.000 0.00816 4.0 0.00431        0.49775          0.45550     NA     NA
    2  2016032401 0 0.000 0.00814 4.6 0.00431        0.49675          0.45675 0.0022 0.0022
    3  2016032402 0 0.000 0.00816 5.2 0.00431        0.49725          0.45600 0.0022 0.0021
    4  2016032403 0 0.000 0.00812 5.3 0.00431        0.49600          0.45625 0.0021 0.0021
    5  2016032404 0 0.000 0.00809 5.5 0.00431        0.49550          0.45525 0.0021 0.0020
    6  2016032405 0 0.001 0.00812 6.5 0.00431        0.49525          0.45425 0.0021 0.0020
    7  2016032406 0 0.009 0.00807 6.9 0.00431        0.49575          0.45550 0.0021 0.0019
    8  2016032407 0 0.018 0.00803 7.0 0.00431        0.49600          0.45400 0.0021 0.0019
    9  2016032408 0 0.032 0.00801 7.4 0.00431        0.49700          0.45175 0.0021 0.0019
    10 2016032409 0 0.060 0.00801 8.3 0.00431        0.49800          0.45050 0.0021 0.0018
    11 2016032410 0 0.066 0.00788 8.7 0.00431        0.49825          0.45100 0.0021 0.0018
    12 2016032411 0 0.088 0.00799 9.5 0.00431        0.49850          0.45075 0.0021 0.0018
    13 2016032412 0 0.102 0.00797 9.6 0.00431        0.50200          0.45025 0.0021 0.0017
    14 2016032413 0 0.077 0.00805 9.9 0.00431        0.50150          0.45000 0.0021 0.0017
    15 2016032414 0 0.074 0.00801 9.9 0.00431        0.50075          0.44775 0.0021 0.0017
    16 2016032415 0 0.053 0.00803 9.7 0.00431        0.50150          0.44825 0.0021 0.0016
    17 2016032416 0 0.018 0.00799 9.4 0.00431        0.50050          0.44725 0.0021 0.0016
    18 2016032417 0 0.003 0.00803 8.8 0.00431        0.49975          0.45000 0.0021 0.0016
    19 2016032418 0 0.000 0.00793 8.1 0.00431        0.49925          0.45025 0.0021 0.0015
    20 2016032419 0 0.000 0.00791 8.1 0.00431        0.49825          0.44600 0.0021 0.0015
    21 2016032420 0 0.000 0.00797 8.1 0.00431        0.49875          0.44475 0.0021 0.0015
    22 2016032421 0 0.000 0.00791 7.3 0.00431        0.50100          0.44300 0.0021 0.0014
    23 2016032422 0 0.000 0.00784 7.0 0.00431        0.50100          0.44125 0.0021 0.0014
    24 2016032423 0 0.000 0.00795 7.6 0.00431        0.50050          0.44050 0.0021 0.0014
    25 2016032500 0 0.000 0.00791 7.6 0.00374        0.50075          0.43975 0.0021 0.0013
acylam
  • 18,231
  • 5
  • 36
  • 45
  • Please provide a [reproducible example](https://stackoverflow.com/a/5963610/3625022) so that we can replicate your problem. – Clarinetist Nov 29 '17 at 16:16
  • 4
    It looks like maybe you're trying to do a bunch of arithmetic in your formula (multiplication, division). Model formulas in R have a special syntax, and while combining variables arithmetically is possible, it's typically not what you do. Try adjusting the columns in your data frame such that you can simply write the formula as `response ~ term1 + term2 + ...` etc without any modifications. – joran Nov 29 '17 at 16:18

1 Answers1

2

Like @joran said in comments, you are trying to do arithmetic on variables in your formula. While doing the arithmetic on the variables before feeding into the formula might be more desirable, it is possible to perform the calculations inside the formula with the I function, which "Inhibits the interpretation of operators such as "+", "-", "/", "^" as formula operators" (?I):

mod = glm(formula = GRM_KerkMeerlo ~ I((fGS*0.865 + fQS*fXS)/Q), data = df)

Result:

> summary(mod)

Call:
glm(formula = GRM_KerkMeerlo ~ I((fGS * 0.865 + fQS * fXS)/Q), 
    data = df)

Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-0.0030453  -0.0021617  -0.0000753   0.0017784   0.0032462  

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     0.45976    0.03871  11.877 4.85e-11 ***
I((fGS * 0.865 + fQS * fXS)/Q)  0.17041    0.16921   1.007    0.325    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 4.71708e-06)

    Null deviance: 0.00010856  on 23  degrees of freedom
Residual deviance: 0.00010378  on 22  degrees of freedom
  (1 observation deleted due to missingness)
AIC: -222.32

Number of Fisher Scoring iterations: 2

Note that I changed your formula for demonstrative purposes, because both EC_QS and EC_fXS are not provided.

Data:

df = structure(list(date = c(2016032400L, 2016032401L, 2016032402L, 
2016032403L, 2016032404L, 2016032405L, 2016032406L, 2016032407L, 
2016032408L, 2016032409L, 2016032410L, 2016032411L, 2016032412L, 
2016032413L, 2016032414L, 2016032415L, 2016032416L, 2016032417L, 
2016032418L, 2016032419L, 2016032420L, 2016032421L, 2016032422L, 
2016032423L, 2016032500L), P = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L), ETpot = c(0, 0, 0, 0, 0, 0.001, 0.009, 0.018, 0.032, 
0.06, 0.066, 0.088, 0.102, 0.077, 0.074, 0.053, 0.018, 0.003, 
0, 0, 0, 0, 0, 0, 0), Q = c(0.00816, 0.00814, 0.00816, 0.00812, 
0.00809, 0.00812, 0.00807, 0.00803, 0.00801, 0.00801, 0.00788, 
0.00799, 0.00797, 0.00805, 0.00801, 0.00803, 0.00799, 0.00803, 
0.00793, 0.00791, 0.00797, 0.00791, 0.00784, 0.00795, 0.00791
), T = c(4, 4.6, 5.2, 5.3, 5.5, 6.5, 6.9, 7, 7.4, 8.3, 8.7, 9.5, 
9.6, 9.9, 9.9, 9.7, 9.4, 8.8, 8.1, 8.1, 8.1, 7.3, 7, 7.6, 7.6
), fXS = c(0.00431, 0.00431, 0.00431, 0.00431, 0.00431, 0.00431, 
0.00431, 0.00431, 0.00431, 0.00431, 0.00431, 0.00431, 0.00431, 
0.00431, 0.00431, 0.00431, 0.00431, 0.00431, 0.00431, 0.00431, 
0.00431, 0.00431, 0.00431, 0.00431, 0.00374), GRM_KerkMeerlo = c(0.49775, 
0.49675, 0.49725, 0.496, 0.4955, 0.49525, 0.49575, 0.496, 0.497, 
0.498, 0.49825, 0.4985, 0.502, 0.5015, 0.50075, 0.5015, 0.5005, 
0.49975, 0.49925, 0.49825, 0.49875, 0.501, 0.501, 0.5005, 0.50075
), GRM_SintJorisweg = c(0.4555, 0.45675, 0.456, 0.45625, 0.45525, 
0.45425, 0.4555, 0.454, 0.45175, 0.4505, 0.451, 0.45075, 0.45025, 
0.45, 0.44775, 0.44825, 0.44725, 0.45, 0.45025, 0.446, 0.44475, 
0.443, 0.44125, 0.4405, 0.43975), fGS = c(NA, 0.0022, 0.0022, 
0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 
0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 
0.0021, 0.0021, 0.0021, 0.0021, 0.0021, 0.0021), fQS = c(NA, 
0.0022, 0.0021, 0.0021, 0.002, 0.002, 0.0019, 0.0019, 0.0019, 
0.0018, 0.0018, 0.0018, 0.0017, 0.0017, 0.0017, 0.0016, 0.0016, 
0.0016, 0.0015, 0.0015, 0.0015, 0.0014, 0.0014, 0.0014, 0.0013
)), .Names = c("date", "P", "ETpot", "Q", "T", "fXS", "GRM_KerkMeerlo", 
"GRM_SintJorisweg", "fGS", "fQS"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25"))
acylam
  • 18,231
  • 5
  • 36
  • 45