I am trying to write a fast algorithm to compute the log gamma function. Currently my implementation seems naive, and just iterates 10 million times to compute the log of the gamma function (I am also using numba to optimise the code).
import numpy as np
from numba import njit
EULER_MAS = 0.577215664901532 # euler mascheroni constant
HARMONC_10MIL = 16.695311365860007 # sum of 1/k from 1 to 10,000,000
@njit(fastmath=True)
def gammaln(z):
"""Compute log of gamma function for some real positive float z"""
out = -EULER_MAS*z - np.log(z) + z*HARMONC_10MIL
n = 10000000 # number of iters
for k in range(1,n+1,4):
# loop unrolling
v1 = np.log(1 + z/k)
v2 = np.log(1 + z/(k+1))
v3 = np.log(1 + z/(k+2))
v4 = np.log(1 + z/(k+3))
out -= v1 + v2 + v3 + v4
return out
I timed my code against the scipy.special.gammaln implementation and mine is literally 100,000's times slower. So I am doing something very wrong or very naive (probably both). Although my answers are at least correct to within 4 decimal places at worst when compared to scipy.
I tried to read the _ufunc code implementing scipy's gammaln function, however I don't understand the cython code that the _gammaln function is written in.
Is there a faster and more optimised way I can calculate the log gamma function? How can I understand scipy's implementation so I can incorporate it with mine?