Why is `speedglm` slower than `glm`?

Question

I am trying to use speedglm to achieve a faster GLM estimation than glm, but why it is even slower?

set.seed(0)
n=1e3
p=1e3
x=matrix(runif(n*p),nrow=n)
y=sample(0:1,n,replace = T)

ptm <- proc.time()
fit=glm(y~x,family=binomial())
print(proc.time() - ptm)
#   user  system elapsed 
#  10.71    0.07   10.78 

library(speedglm)
ptm <- proc.time()
fit=speedglm(y~x,family=binomial())
print(proc.time() - ptm)
#   user  system elapsed 
#  15.11    0.12   15.25

you should explain the context in which you're working. If you want to do many small GLM fits as fast as possible (a reasonable possibility) you might want to consider using `glm.fit()` directly ... — Ben Bolker, May 26 '17 at 18:32
@BenBolker I am trying fit a logistic regression with 3 million rows and ~1000 columns, want to see how fast it runs in different package. — hxd1011, May 26 '17 at 18:43

Zheyuan Li · Accepted Answer · 2017-05-26T18:58:29.180

The efficiency of speedglm over glm, is the way it reduces the n * pmodel matrix to a p * p matrix. However, if you have n = p, there is no effective reduction. What you really want to check, is the n >> p case.

More insight form computational complexity, in each iteration of Fisher scoring.

glm using QR factorization for a n * p matrix takes 2np^2 - (2/3)p^3 FLOP, while speedglm forming matrix cross product of a n * p matrix, followed by a QR factorization of a p * p matrix, involves np^2 + (4/3)p^3 FLOP. So as n >> p, speedglm only has half the computation amount of glm. Furthermore, the blocking, caching strategy used by speedglm gives better use of computer hardware, giving high performance.

If you have n = p, you immediately see that glm takes (4/3)p^3 FLOP, but speedglm takes p^3 + (4/3)p^3 FLOP, more expensive! In fact, the matrix cross product becomes a shear overhead in this case!

Why is `speedglm` slower than `glm`?

1 Answers1