optimal way of defining a numerically stable sigmoid function for a list in python

Question

For a scalar variable x, we know how to write down a numerically stable sigmoid function in python:

def sigmoid(x):
    if x >= 0:
        return 1. / ( 1. + np.exp(-x) )
    else:
        return exp(x) / ( 1. + np.exp(x) )

For a list of scalars, say z = [x_1, x_2, x_3, ...], and suppose we don't know the sign of each x_i beforehand, we could generalize the above definition and try:

def sigmoid(z):
    result = []
    for x in z:
        if x >= 0:
            result.append(1. / ( 1. + np.exp(-x) ) )
        else:
            result.append( exp(x) / ( 1. + np.exp(x) ) )
    return result

This seems to work. However, I feel this is perhaps not the most pythonic way. How should I improve the definition in terms of 'cleanness'? Say, is there a way to use comprehension to shorten the function definition?

I'm sorry if this has been asked, because I cannot find similar questions on SO. Thank you very much for your time and help!

It used to be as simple as `scipy.special.expit(x)`, but [someone changed it](https://github.com/scipy/scipy/commit/30e181c1179177bd4e40c240ca70ce3b82dac873) back in 2016. — user2357112, Aug 22 '18 at 23:37
I'm not sure how important it is, though - I can find a few cases where the supposedly more stable version is about 1 ULP more accurate, but I can't find a solid analysis proving it's better, and [the closest thing to a source](http://fa.bianp.net/blog/2013/numerical-optimizers-for-logistic-regression/) I've found analyzes the behavior of overflow wrong, coming to the incorrect conclusion that the straightforward 1/(1+exp(-x)) implementation will return infinity for x=-100. — user2357112, Aug 22 '18 at 23:44
It seems like the superiority of the "stable" version may be a myth, or at most a tiny precision advantage. — user2357112, Aug 22 '18 at 23:56
@user2357112supportsMonica Infinity will be returned for ~`-710` with naive implementation (`numpy`, `double` precision) — Szymon Maszke, Nov 06 '20 at 16:29

DYZ · Accepted Answer · 2018-08-22T23:42:55.920

10

You are right, you can do better by using np.where, the numpy equivalent of if:

def sigmoid(x):
    return np.where(x >= 0, 
                    1 / (1 + np.exp(-x)), 
                    np.exp(x) / (1 + np.exp(x)))

This function takes a numpy array x and returns a numpy array, too:

data = np.arange(-5,5)
sigmoid(data)
#array([0.00669285, 0.01798621, 0.04742587, 0.11920292, 0.26894142,
#       0.5       , 0.73105858, 0.88079708, 0.95257413, 0.98201379])

edited Aug 22 '18 at 23:42

answered Aug 22 '18 at 23:30

DYZ

55,249
10
64
93

Thanks for such a quick response. This is a lovely solution. May I ask is it possible to get rid of the element-wise for loop where we need to check the sign of x one at a time? – RandomWalker Aug 22 '18 at 23:40
1

I do not follow you. There is no elementwise loop in the proposed solution. – DYZ Aug 22 '18 at 23:42
Okay, I made a mistake when I passed a python list z to your definition, so an exception was raised. So I though I still need the for loop in the original definition. Passing np.array(z) works. I'll mark the question as solved. Thanks so much! – RandomWalker Aug 22 '18 at 23:45
1

If you want to pass a list, just convert it to an array before passing (with `np.array(mylist)`). – DYZ Aug 22 '18 at 23:46
2

seems like the `np.where` evaluates both branches and then chooses which one it needs, this results in misleading overflow warnings. something like `sigmoid(np.array(-300, np.float32))` – titus May 27 '20 at 08:18
@titus You are right. See [my answer](https://stackoverflow.com/a/62860170/8776746) for a faster and more natural solution. – ynn Jul 12 '20 at 10:55
See [this answer](https://stackoverflow.com/a/64717799/10886420) for discussion and downsides of this solution (unnecessary double evaluation which gives the runtime error) – Szymon Maszke Nov 06 '20 at 15:57

Szymon Maszke · Answer 2 · 2022-09-17T12:18:34.620

Fully correct answer (no warnings) was provided by @hao peng but solution wasn't explained clearly. This would be too long for a comment, so I'll go for an answer.

Let's start with analysis of a few answers (pure numpy answers only):

@DYZ accepted answer

This one is correct mathematically but still gives us a warning. Let's look at the code:

def sigmoid(x):
    return np.where(
            x >= 0, # condition
            1 / (1 + np.exp(-x)), # For positive values
            np.exp(x) / (1 + np.exp(x)) # For negative values
    )

As both branches are evaluated (they are arguments, they have to be), the first branch will give us a warning for negative values and the second for positive.

Although the warnings will be raised, results from overflows will not be incorporated, hence the result is correct.

Downsides

unnecessary evaluation of both branches (twice as many operations as needed)
warnings are thrown

@ynn answer

This one is almost correct, BUT will work only on floating point values, see below:

def sigmoid(x):
    return np.piecewise(
        x,
        [x > 0],
        [lambda i: 1 / (1 + np.exp(-i)), lambda i: np.exp(i) / (1 + np.exp(i))],
    )


sigmoid(np.array([0.0, 1.0]))  # [0.5 0.73105858] correct
sigmoid(np.array([0, 1]))  # [0, 0] incorrect

Why? Longer answer was provided by @mhawke in another thread, but the main point is:

It seems that piecewise() converts the return values to the same type as the input so, when an integer is input an integer conversion is performed on the result, which is then returned.

Downsides

no automatic casting due to strange behavior of piecewise function

Improved @hao peng answer

Idea of stable sigmoid comes from the fact that:

Both versions are equally efficient in terms of operations if coded correctly (one exp evaluation is enough). Now:

e^x will overflow when x is positive
e^-x will overflow when x is negative

Hence we have to branch on x equal to zero. Using numpy's masking we can transform only the part of array which is positive or negative with specific sigmoid implementations.

See code comments for additional points:

def _positive_sigmoid(x):
    return 1 / (1 + np.exp(-x))


def _negative_sigmoid(x):
    # Cache exp so you won't have to calculate it twice
    exp = np.exp(x)
    return exp / (exp + 1)


def sigmoid(x):
    positive = x >= 0
    # Boolean array inversion is faster than another comparison
    negative = ~positive

    # empty contains junk hence will be faster to allocate
    # Zeros has to zero-out the array after allocation, no need for that
    # See comment to the answer when it comes to dtype
    result = np.empty_like(x, dtype=np.float)
    result[positive] = _positive_sigmoid(x[positive])
    result[negative] = _negative_sigmoid(x[negative])

    return result

Time measurements

Results (50 times case test from ynn):

289.5070939064026 #DYZ
222.49267292022705 #ynn
230.81086134910583 #this

Indeed piecewise seems faster (not sure about the reasons, maybe masking and additional masking ops make it slower).

Code below was used:

import time

import numpy as np


def _positive_sigmoid(x):
    return 1 / (1 + np.exp(-x))


def _negative_sigmoid(x):
    # Cache exp so you won't have to calculate it twice
    exp = np.exp(x)
    return exp / (exp + 1)


def sigmoid(x):
    positive = x >= 0
    # Boolean array inversion is faster than another comparison
    negative = ~positive

    # empty contains juke hence will be faster to allocate than zeros
    result = np.empty_like(x)
    result[positive] = _positive_sigmoid(x[positive])
    result[negative] = _negative_sigmoid(x[negative])

    return result


N = int(1e4)
x = np.random.uniform(size=(N, N))

start: float = time.time()
for _ in range(50):
    y1 = np.where(x > 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))
    y1 += 1
end: float = time.time()
print(end - start)

start: float = time.time()
for _ in range(50):
    y2 = np.piecewise(
        x,
        [x > 0],
        [lambda i: 1 / (1 + np.exp(-i)), lambda i: np.exp(i) / (1 + np.exp(i))],
    )
    y2 += 1
end: float = time.time()
print(end - start)

start: float = time.time()
for _ in range(50):
    y2 = sigmoid(x)
    y2 += 1
end: float = time.time()
print(end - start)

This is a fantastic explanation. Thanks for taking the time! — root_access, Nov 19 '21 at 09:15
Improved hao peng answer doesn’t seem to work if arrays contain integers instead of float — amarchin, Sep 16 '22 at 10:17

score 5 · Answer 3 · answered Jul 24 '19 at 08:29

def sigmoid(x):
    """
    A numerically stable version of the logistic sigmoid function.
    """
    pos_mask = (x >= 0)
    neg_mask = (x < 0)
    z = np.zeros_like(x)
    z[pos_mask] = np.exp(-x[pos_mask])
    z[neg_mask] = np.exp(x[neg_mask])
    top = np.ones_like(x)
    top[neg_mask] = z[neg_mask]
    return top / (1 + z)

This piece of code comes from assignment3 of cs231n, I don't really understand why we should calculate it in this way, but I know this may be the code that you are looking for. Hope to be helpful.

score 1 · Answer 4 · answered Jul 12 '20 at 10:54

The accepted answer is correct but, as pointed out by this comment, it calculates both branches and is thus problematic.

Rather, you may want to use np.piecewise(). This is much faster, meaningful (np.where is not intended to define a piecewise function) and free of misleading warnings caused by entering into both branches.

Benchmark

Source Code

import numpy as np
import time

N: int = int(1e+4)

np.random.seed(0)

x: np.ndarray = np.random.random((N, N))
x *= 1e+3

start: float = time.time()
y1 = np.where(x > 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))
end: float = time.time()
print()
print(end - start)

start: float = time.time()
y2 = np.piecewise(x, [x > 0], [lambda i: 1 / (1 + np.exp(-i)), lambda i: np.exp(i) / (1 + np.exp(i))])
end: float = time.time()
print(end - start)

assert (np.array_equal(y1, y2))

Result

np.piecewise() is silent and twice faster!

test.py:12: RuntimeWarning: overflow encountered in exp
  y1 = np.where(x > 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))
test.py:12: RuntimeWarning: invalid value encountered in true_divide
  y1 = np.where(x > 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))

6.32736349105835
3.138420343399048

See [my answer](https://stackoverflow.com/a/64717799/10886420) for comparison and why your solution does not work for edge case (`np.array` is integer). — Szymon Maszke, Nov 06 '20 at 16:25

score 0 · Answer 5 · 2018-08-22T23:58:03.703

0

Another alternative to your code is the following:

def sigmoid(z):
    return [(1. / (1. + np.exp(-x)) if x >= 0 else (np.exp(x) / (1. + np.exp(x))) for x in z]

edited Aug 22 '18 at 23:58

answered Aug 22 '18 at 23:44

3

There is really no point in using numpy functions and not using numpy vectorization. – DYZ Aug 22 '18 at 23:48
This is a very nice and clean alternative. Thanks a lot! – RandomWalker Aug 22 '18 at 23:48
1

@DYZ While I agree that your solution is shorter and more consequent, it is preferred to use list comprehensions instead of for loops. I will edit my answer so that everyone can see that this is an alternative to the code posted in the question, not to your solution. – Aug 22 '18 at 23:56

score 0 · Answer 6 · answered Jul 16 '20 at 14:19

I wrote one trick, I guess np.where or torch.where are implemented in the same manner to deal with binary conditions:

def sigmoid(x, max_v=1.0):    
    sign = (torch.sign(x) + 3)//3
    x = torch.abs(x)
    res = max_v/(1 + torch.exp(-x))
    res = res * sign + (1 - sign) * (max_v - res)
    return res

optimal way of defining a numerically stable sigmoid function for a list in python

6 Answers6

@DYZ accepted answer

Downsides

@ynn answer

Downsides

Improved @hao peng answer

Time measurements

Benchmark

Linked