-1

I have over 1000 columns in my data frame. I want to run linear regression on all variables and do not want to write them one by one. When I try this,

lm(goal ~ ., data = df)

I get this error:

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

I have columns with character class, factors, numerics, etc. I am guessing I should remove columns that are cannot be variables. How do I do this?

lmo
  • 37,904
  • 9
  • 56
  • 69
user6452857
  • 117
  • 2
  • 9
  • 3
    This seems to be saying that one of the factor variables that you are passing to the regression has only one level.. So find it and omit it it. You can find these by `sapply(d[sapply(d, is.factor)], nlevels)` ; look for those with one level – user20650 Feb 25 '17 at 11:49
  • 1
    This seems like an extremely lazy attitude. – Pierre L Feb 25 '17 at 11:57
  • 1
    Isn't lazy good? Shouldn't we always want to do things lazy to create more time for other activities. Am I actually supposed to write out 1000+ variables. I am planning on going to 3000+ variables. Why would I waste time writing that out? – user6452857 Feb 25 '17 at 13:36

1 Answers1

4

You can exclude the offending variables with the subtraction - operator

lm(goal ~ . - var, data = df)
salient
  • 2,316
  • 6
  • 28
  • 43
  • And to find the set of columns to remove this way you should check for columns that have only one value. For numerics it could be described as zero variance but will likely be a mix of numeric and factor types. Something like `which(sapply(df, function(x) length(unique(x)))== 1)` should get you the column indices to remove. – vincentmajor Feb 25 '17 at 18:39