5

The reason the title of the question is the error I am getting is because I simply do not know how to interpret it, no matter how much I research. Whenever I run a logistic regression with bigglm() (from the biglm package, designed to run regressions over large amounts of data), I get:

Error in family$linkinv(eta) : Argument eta must be a nonempty numeric vector

This is how my bigglm() function looks like:

fit <- bigglm(f, data = df, family=binomial(link="logit"), chunksize=100, maxit=10) 

Where f is the formula and df is the dataframe (of little over a million rows and about 210 variables).

So far I have tried changing my dependent variable to a numeric class but that didn't work. My dependent variable has no missing values.

Judging from the error message I wonder if this might have to do anything with the family argument in the bigglm() function. I have found numerous other websites with people asking about the same error and most of them are either unanswered, or for a completely different case.

jgozal
  • 1,480
  • 6
  • 22
  • 43
  • 1
    you need to provide a working, reproducible example – rawr Jan 08 '16 at 01:15
  • @rawr this is not a rhetorical question, but would you mind advising me on an appropriate way to create a reproducible example of a dataset containing 210 variables? – jgozal Jan 08 '16 at 03:13
  • 2
    you only need to use enough variables to reproduce the problem. if it absolutely only works for your exact data, then the best we could do is guess or post the entire data set. but a minimal example that reproduces the error is highly preferred – rawr Jan 08 '16 at 05:33
  • See the canonical post on [R reproducible examples](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) – alistaire Jan 13 '16 at 18:15

1 Answers1

8

The error Argument eta must be a nonempty numeric vector to me looks like your data has either empty values or NA. So, please check your data. Whatever advice we provide here, cannot be tested until we see your code or the steps involved resulting an error. try this

is.na(df) # if TRUE, then replace them with 0
df[is.na(df)] <- 0 # Not sure replacing NA with 0 will have effect on your model

or whatever line of the code is resulting in NAs generation pass na.rm=Targument

Again, we can only speculate. Hope it helps.

user5249203
  • 4,436
  • 1
  • 19
  • 45
  • Thank you for your answer. I am in the process of building a small reproducible example. I did try to do `df <- na.omit(df)` as well and it didn't solve the problem – jgozal Jan 08 '16 at 22:37
  • Check if your `df` has values stored as numeric or characters or factors ? `sapply(df, mode)`. Ideally, numeric is what you want. – user5249203 Jan 11 '16 at 22:00
  • 2
    @jgozal whatever you do I wouldn't recommend replacing NAs with 0 like user5249203 advised. This will definitely affect your model and invalidate any possible results. – simtim Jan 24 '16 at 17:15