22

I am doing logistic regression in R. Can somebody clarify what is the differences of running these two lines?

1. glm(Response ~ Temperature, data=temp, 
                    family = binomial(link="logit"))
2. glm(cbind(Response, n - Response) ~ Temperature, 
                    data=temp, family =binomial, Ntrials=n)

The data looks like this: (Note : Response is binary. 0=Die 1=Not die)

Response  Temperature
0         24.61
1         39.61
1         39.50
0         22.71
0         21.61
1         39.70
1         36.73
1         33.32
0         21.73
1         49.61
rcs
  • 67,191
  • 22
  • 172
  • 153
Eddie
  • 783
  • 4
  • 12
  • 24
  • Paul...the first line is straight forward to understand. :). I tried to figure out the second one because some examples in R used it. AND..those two generates different result. :) – Eddie Feb 02 '12 at 12:24
  • 4
    @James is right, I believe. If `n` is 1 then you should get exactly the same answer in this case. In general you should use the second form when you have more than one trial per observation. The `Ntrials` argument is bogus/unnecessary, as far as I can tell. – Ben Bolker Feb 02 '12 at 13:12
  • Thank you very much Ben. Could you elaborate furtheron what do you mean by "more than one trial pr observation" please? :)- – Eddie Feb 02 '12 at 15:39
  • 4
    Suppose your data are grouped so that you had measured multiple individuals (e.g. 10) at each temperature value; you then might have 7 out of 10 surviving at temp 22.71, so your estimation would be based on a binomial outcome of 7 surviving with probability p in N=10 trials. Usually when people say "logistic regression" they mean ungrouped data (`N=1`), reserving "binomial regression" for the grouped case, but the terms are somewhat interchangeable ... – Ben Bolker Feb 02 '12 at 19:30

1 Answers1

20

When doing the binomial or quasibinomial glm, you either supply a probability of success, a two-column matrix with the columns giving the numbers of successes and failures or a factor where the first level denotes failure and the others success on the left hand side of the equation. See details in ?glm.

James
  • 65,548
  • 14
  • 155
  • 193
  • 9
    Note that when using the frequency form of a binomial glm, you should supply the number of observations per trial in the `weights` argument. It would look like: `glm(events/n ~ x, data=*, weights=n, ...)` – Hong Ooi Feb 02 '12 at 15:16