9

I need logit and inverse logit functions so that logit(inv_logit(n)) == n. I use numpy and here is what I have:

import numpy as np
def logit(p):
    return np.log(p) - np.log(1 - p)

def inv_logit(p):
    return np.exp(p) / (1 + np.exp(p))

And here are the values:

print logit(inv_logit(2)) 
2.0 

print logit(inv_logit(10))
10.0 

print logit(inv_logit(20))
20.000000018 #well, pretty close

print logit(inv_logit(50))
Warning: divide by zero encountered in log
inf 

Now let's test negative numbers

print logit(inv_logit(-10))
-10.0 
print logit(inv_logit(-20))
-20.0 
print logit(inv_logit(-200))
-200.0 
print logit(inv_logit(-500))
-500.0 
print logit(inv_logit(-2000))
Warning: divide by zero encountered in log
-inf 

So my questions are: what is the proper way to implement these functions so that the requirement logit(inv_logit(n)) == n will hold for any n in as wide a range as possible (at least [-1e4; 1e4)?

And also (and I'm sure this is connected to the first one), why are my function more stable with negative values, compared to the positive ones?

mskfisher
  • 3,291
  • 4
  • 35
  • 48
Boris Gorelik
  • 29,945
  • 39
  • 128
  • 170

5 Answers5

13

Either use

1. The bigfloat package with supports arbitrary precision floating point operations.

2. The SymPy symbolic math package. I'll give examples of both:

First, bigfloat:

http://packages.python.org/bigfloat/

Here's a simple example:

from bigfloat import *
def logit(p):
    with precision(100000):
        return log(p)- log(1 -BigFloat(p))

def inv_logit(p):
    with precision(100000):
        return exp(p) / (1 + exp(p))

int(round(logit(inv_logit(12422.0))))
# gives 12422
int(round(logit(inv_logit(-12422.0))))
# gives -12422

This is really slow. You may want to consider restructuring your problem and do some parts analytically. Cases like these are rare in real problems - I'm curious about what kind of problem you are working on.

Example installation:

wget http://pypi.python.org/packages/source/b/bigfloat/bigfloat-0.3.0a2.tar.gz
tar xvzf bigfloat-0.3.0a2.tar.gz 
cd bigfloat-0.3.0a2
as root:
python setup.py install

About the reason your functions wore better with negative values. Consider:

>>> float(inv_logit(-15))
3.059022269256247e-07

>>> float(inv_logit(15))
0.9999996940977731

In the first case floating point numbers represent this value easily. The decimal point is moved so that the leading zeroes: 0.0000... does not need to be stored. In the second case all the leading 0.999 needs to be stored, so you need all that extra precision to get an exact result when later doing 1-p in logit().

Here's the symbolic math way (significantly faster!):

from sympy import *
def inv_logit(p):
    return exp(p) / (1 + exp(p))
def logit(p):
    return log(p)- log(1 -p)

x=Symbol('x')
expr=logit(inv_logit(x))
# expr is now:
# -log(1 - exp(x)/(1 + exp(x))) + log(exp(x)/(1 + exp(x)))
# rewrite it: (there are many other ways to do this. read the doc)
# you may want to make an expansion (of some suitable kind) instead.
expr=cancel(powsimp(expr)).expand()
# it is now 'x'

# just evaluate any expression like this:    
result=expr.subs(x,123.231)

# result is now an equation containing: 123.231
# to get the float: 
result.evalf()

Sympy is found here http://docs.sympy.org/. In ubuntu it's found via synaptic.

Johan Lundberg
  • 26,184
  • 12
  • 71
  • 97
9

There is a way to implement the functions so that they are stable in a wide range of values but it involves a distinction of cases depending on the argument.

Take for example the inv_logit function. Your formula "np.exp(p) / (1 + np.exp(p))" is correct but will overflow for big p. If you divide numerator and denominator by np.exp(p) you obtain the equivalent expression

1. / (1. + np.exp(-p))

The difference being that this one will not overflow for big positive p. It will overflow however for big negative values of p. Thus, a stable implementation could be as follows:

def inv_logit(p):
    if p > 0:
        return 1. / (1. + np.exp(-p))
    elif p <= 0:
        np.exp(p) / (1 + np.exp(p))
    else:
        raise ValueError

This is the strategy used in the library LIBLINEAR (and possibly others).

Fabian Pedregosa
  • 6,329
  • 1
  • 22
  • 20
4

Nowadays, scipy has logit and expit (inverse logit) functions, eg

>>> from scipy.special import logit, expit
>>> import numpy as np
>>> logit([0, 0.25, 0.5, 0.75, 1])
array([       -inf, -1.09861229,  0.        ,  1.09861229,         inf])
>>> expit([-np.inf, -1.5, 0, 1.5, np.inf])
array([ 0.        ,  0.18242552,  0.5       ,  0.81757448,  1.        ])

Shadi
  • 9,742
  • 4
  • 43
  • 65
2

You're running up against the precision limits for a IEEE 754 double-precision float. You'll need to use higher-precision numbers and operations if you want a larger range and a more precise domain.

>>> 1 + np.exp(-37)
1.0
>>> 1 + decimal.Decimal(-37).exp()
Decimal('1.000000000000000085330476257')
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
0

My variant of Fabian Pedregosa's answer: def stable_inv_logit(x): return 0.5*(1. + np.sign(x)*(2./(1. + np.exp(-np.abs(x))) - 1.))