Problem with lm() giving incorrect result when using 1/var as second variable

Question

I have a dataframe with two variables that I will plot with the inverse of the varaibles and make a linear regression on this.

linear_mod <- lm((1/df$var1[3:length(var1)]) ~       # 3 because the first two are 0 and would result in 1/0
                 (1/df$var2[3:length(var2)]))
png("lineweaver-burk_glc6p.png", height = 400)
plot(1/(df$var1), 1/(df$var2))
abline(linear_mod)

however this just results in y=0.897, with no slope.

I know I can assign 1/var to two variables and use them to get it to work like so

temp1 <- 1/df$var1[3:length(df$var1)]
temp2 <- 1/df$var2[3:length(df$var2)]

which does result in a correct regression line (0.83022 + 0.01768x), but I would like to know what causes the lm() function to not function when using the above.

I have tested using one temp variable and one explicitly written out, and this only gives a slope when the temp variable occupies the second spot, so the lm() function only seem to accept the 1/df$var1[3:length(var1)] if it is before ~, and not after ~.

Changing it to just be without 1/ on the second variable makes it give a slope.

linear_mod <- lm((1/df$var1[3:length(var1)]) ~       # 3 because the first two are 0 and would result in 1/0
                 (df$var2[3:length(var2)]))

and putting for example 2+df$var... as the second variable also gives a slope, thus there seem to be something specific with / and it seems to be an inconsistency between using this math in the first and second variable and I wonder why this is the case. Putting the second variable inside c() does make it work, but I don't see why that wouldn't also be necessary for the first variable.

Here is a teble with the variables in the dataframe. |var2|var1 | |----|------------------| |0 |0.0133976420150054 | |0 |0.00803858520900322| |0.1|1.17363344051447 | |0.1|1.13076098606645 | |0.2|2.05787781350482 | |0.2|2.18113612004287 | |0.2|1.7524115755627 | |0.2|0.844051446945338 | |0.2|1.42550911039657 | |0.3|0.244908896034298 | |0.3|0.616291532690247 | |0.3|1.39067524115756 | |0.3|0.669882100750268 | |0.3|1.66934619506967 | |0.3|1.56752411575563 | |0.3|1.33976420150054 | |0.4|1.83547695605573 | |0.4|1.77920685959271 | |0.5|1.83547695605573 | |0.5|1.84887459807074 | |1 |1.92390139335477 | |2 |1.94533762057878 | |2 |1.7470525187567 |

Please provide enough code so others can better understand or reproduce the problem. — Community, Feb 25 '23 at 22:52
You can try something like `lm(y ~ I(1/x))` if you don't want to specify an external variable (means to treat the expression "as is" rather than as part of a formula). Look at the documentation for `?formula` or see the responses here: https://stackoverflow.com/questions/29880938/fit-a-curve-model-to-1-x-data — nrennie, Feb 25 '23 at 23:06
Using 1/x inside a formula is equivalent to doing `1 + x %in% 1`. i.e. fit the model `y~1`, with just an intercept. — nrennie, Feb 25 '23 at 23:09
Thanks, that explains it, but am I understanding correctly that only the second variable is treated as part of a formula and that is why it is only necessary on the second variable? — DragonStaty, Feb 26 '23 at 09:06

score 0 · Answer 1 · answered Feb 25 '23 at 23:15

When using the formula interface with lm(), you need to use the I() notation in your formula. Check ?formula for more details. This is because the division operator / is not working element-wise but performs a single division on the whole vector which therefore you get a single scalar value. Therefore, If you want to perform regression using inverse of variables you must use the I() function to tell R to treat the division as an element-wise operation. Also to skip the first two rows of your data you can simply add data=df[-c(1:2),] inside lm() function. , like this:

linear_mod <- lm(I(1/var1) ~ I(1/var2), data=df[-c(1:2),])

Hope it could helps

Problem with lm() giving incorrect result when using 1/var as second variable

1 Answers1