0

I am trying to develop a binomial model in R.

I want to use a formula that looks like this: VAL = X0 + b1 * X1 + b2 * X2

Where X0, X1, and X2 are variables in my data frame and b1 and b2 are the coefficients I want to develop. I want the target value Y to be TRUE/1 if this formula produces a VAL > 0 and FALSE/0 if it produces a VAL < 0.

Sample Data with b1 & b2 set to 1:
Target X0 X1 X2 VAL Result
1 86 -54 17 49 1
0 0 -54 17 -37 0
1 40 -15 23 48 1
0 50 -20 -25 5 1

I want the value of X0 to be incorporated in the prediction, but I do not want this variable to have a coefficient (as this is a predefined formula that I can't change).

The reason I need X0 in the model is because if X1 and X2 are equal for two observations that have different X0 values (as in first 2 observations), I want to reflect that in my formula. One observation's X0 could cause VAL to be negative and the other observations's X0 could cause VAL to be positive, but this would not be reflected if X0 was left completely out of the model. Also note the last observation in which I would either need to increase b1 or b2 so that VAL is negative and the result is 0 (which the model would not see without seeing X0).

I currently am using a formula that looks like glm("Y~X0+X1+X2", family = binomial(link = "logit")), but this model produces a coefficient for X0. How would I develop a model forcing X0 to have no coefficient?

  • 1
    What do you mean you want X0 incorporated in the prediction but not have a coefficient? Having a coefficient is what allows you to use X0 in the prediction. – mickey Nov 09 '18 at 18:03
  • Are you looking for a model without an intercept? – Henry Cyranka Nov 09 '18 at 18:05
  • 1
    I don't understand how you can force something to be in the model without a coefficient. If you don't want it to have a coefficient, don't put it in the model. Maybe it would be easier to help with a proper [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and the desired output. – MrFlick Nov 09 '18 at 18:11
  • you can fit a no intercept model like this `glm(status==1~0+age, data =lung, family = binomial)`. However, I highly recommend having a sufficient justification for this- see this post https://stats.stackexchange.com/questions/260209/the-difference-between-with-or-without-intercept-model-in-logistic-regression – Mike Nov 09 '18 at 19:05
  • @mickey I edited my question to be more specific. To be clear, I don't mind having an intercept on top of X0 (if that is possible). I just don't want the model to develop a coefficient for X0 (making the model too dependent on X0), since I will not be able to use that coefficient in my predictions due to external restrictions. – Sarah Decker Nov 09 '18 at 19:19

2 Answers2

1

As an update, I was looking for the offset() function. In this case, I used offset(X0) as an added predictor to my model.

0

It looks like what you want is to have the coefficient for X0 be zero. If you can't change the formula (to omit X0), you could change the data. Here's an example:

n = 1000
df = data.frame('x1'=rnorm(n), 'x2'=rnorm(n))
df0 = df
df0[,2] = 0

y = 0.5 + 1.5*df[,1] - 1.0*df[,2] + rnorm(n, 0, 0.1)

mod1 = lm(y ~ x1, data = df)
mod2 = lm(y ~ x1 + x2, data = df)
mod3 = lm(y ~ x1 + x2, data = df0)

It sounds like mod1 is what you want, but since you can't change the formula, you're stuck with mod2 or mod3. mod2 won't work since this will give an estimate for x2. mod3 is the same as mod1 except the coefficient for x2 will be NA, but the intercept and x1 will have the same cofficients.

Having the coefficient for x2 be NA is comparable to having it be zero. The predictions from mod1 and mod3 will be the same, but mod3 does throw a warning.

mickey
  • 2,168
  • 2
  • 11
  • 20