0

I get this error depending on which variables I include and the sequence in which I specify them in the formula:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

I've done a little research on this and it looks like it would be caused by the variable in question not being a factor variable. In this case (is_women_owned), it is a factor variable with 2 levels ("Yes", "No").

> levels(customer_accounts$is_women_owned)
[1] "No"  "Yes"

No error:

f1 <- lm(combined_sales ~ is_women_owned, data=customer_accounts)

No error:

f2 <- lm(combined_sales ~ total_assets + market_value + total_empl + empl_growth + sic + city + revenue_growth + revenue + net_income + income_growth, data=customer_accounts)

Regressing on the above formula plus the factor variable "is_women_owned":

f3 <- lm(combined_sales ~ total_assets + market_value + total_empl + empl_growth + sic + city + revenue_growth + revenue + net_income + income_growth + is_women_owned, data=customer_accounts)

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

I get the same error when applying stepwise linear regression, as you would expect.

This seems like a bug, it should give us a model where "is_women_owned" perhaps offers no additional explanatory value because it is highly correlated to the other variables, not error out like this.

I verified that there is no missing data for this variable, too:

> which(is.na(customer_accounts$is_women_owned))
integer(0)

Also, there are two values present in the factor variable:

customer_accounts$is_women_owned[1:20]
 [1] No  No  No  No  No  No  No  No  No  No  No  No  No  No  Yes No 
[17] No  No  No  No 
Levels: No Yes
Nate Reed
  • 6,761
  • 12
  • 53
  • 67
  • 1
    For F2 the value of is_women_owned is always the same? This is what is causing the error. Also if the varible is the same across all responses then it gives no information on the response variable and should not be a factor in your regression – Marsenau Jan 15 '16 at 21:09
  • Having a hard time understanding that that means, actually. The factor variable is_women_owned has two values. There might be some relationship with the other variables. Maybe it's highly correlated with one or more of those. In that case, it's redundant in the model. But it's not limited to one value. – Nate Reed Jan 15 '16 at 21:17
  • You said "is_women_owned is always the same value for f2". Your data also shows no filtering to make this explicitly true. Is there a case in customer_accounts where f2 is a different value? – Marsenau Jan 15 '16 at 21:20
  • I think the wording of my comment on f2 was misleading/confusing. I'll edit it to make it more clear. I should have written f3 first of all. Second, my comment is on the regression model, not on the values of the variables. What I meant was Is "is_women_owned" equivalent to "total_assets + market_value + total_empl + empl_growth + sic + city + revenue_growth + revenue + net_income + income_growth")?The values of the predictor variables are the same for all models (f1, f2, f3). – Nate Reed Jan 15 '16 at 21:24
  • When I have done this in the past with some contrived example variables this wouldn't cause an error. For example, adding a third variable x3 = x1 + x2 and regression on y ~ x1 + x2 + x3 works fine, it just contains redundant information. – Nate Reed Jan 15 '16 at 21:26
  • However this is just a hypothesis as to what's going on, I don't understand why introducing this two-level factor variable is causing this error. Also, it works fine if it appears before the other variables in the formula. – Nate Reed Jan 15 '16 at 21:29

1 Answers1

1
twofac = data.frame("y" = c(1,2,3,4,5,1), "x" = c(2,56,3,5,2,1), "f" = c("apple","apple","apple","apple","apple","banana"))
onefac = twofac[1:5,]

lm(y~x+f,data=twofac)
lm(y~x+f,data=onefac)

> str(onefac)
'data.frame':   5 obs. of  3 variables:
 $ y: num  1 2 3 4 5
 $ x: num  2 56 3 5 2
 $ f: Factor w/ 2 levels "apple","banana": 1 1 1 1 1
> str(twofac)
'data.frame':   6 obs. of  3 variables:
 $ y: num  1 2 3 4 5 1
 $ x: num  2 56 3 5 2 1
 $ f: Factor w/ 2 levels "apple","banana": 1 1 1 1 1 2
> lm(y~x+f,data=twofac)

Call:
lm(formula = y ~ x + f, data = twofac)

Coefficients:
(Intercept)            x      fbanana  
    3.30783     -0.02263     -2.28519  

> lm(y~x+f,data=onefac)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

If you run the above you will notice twofac, a model with a 2-level factor where both factors are present, will run with no problem. onefac, a model with the same 2-level factor but only one level is present, gives the same error you got.

If your factor only has one of the levels then regressing against that factor gives no additional information as it is constant across all responsevariables

Marsenau
  • 1,095
  • 2
  • 13
  • 18
  • My factor variable has two values, not one. – Nate Reed Jan 15 '16 at 21:18
  • I understand that but the error is telling you that your factor with 2 levels only has one level present. Regardless of the error or not that is not a variable that will give you any additional information – Marsenau Jan 15 '16 at 21:43
  • Right, I understand your explanation and that makes sense, but I verified that this variable has two values present. I edited my question above to show that. – Nate Reed Jan 15 '16 at 22:43
  • Very likely I'm not understanding something fundamental about linear regression with factor variables. – Nate Reed Jan 15 '16 at 22:47