6

Tried using curve_fit (scipy API, for fitting a sigmoid) with fixed seed for numpy, but still the results vary somewhat. Is there any way to make it deterministic completely?

As requested in the comments, here's a minimal working example:

from scipy.optimize import curve_fit
import numpy as np

def sigmoid(x, b, mu, max_kr):
    if isinstance(x, list) or isinstance(x, np.ndarray):
        return [sigmoid(xx, b, mu, max_kr) for xx in x]
    else:
        return max_kr/(1+10**(mu*(-x+b)))

def fit_sigmoid(points):
    xs, ys = list(zip(*points))
    err = None
    popt, pcov = curve_fit(sigmoid, xs, ys, bounds=([-np.inf, 0, 0],    [np.inf, np.inf, 1]), ftol=len(xs)*1e-6)
    b, mu, max_kr = popt
    return mu

np.random.seed = 12
points1 = [(4.0, 1.0), (1.0, 8.340850913002296e-05), (3.0, 0.9793319563421965), (0.0, 8.340850913002296e-05), (-1.0, 0.0), (2.0, 0.010306481917677357)]
points2 = [(4.0, 1.0), (-1.0, 0.0), (3.0, 0.9793319563421965), (0.0, 8.340850913002296e-05), (1.0, 8.340850913002296e-05), (2.0, 0.010306481917677357)]
print(fit_sigmoid(points1))
print(fit_sigmoid(points2))

Seems like the order of the points matters. Out of curiousness, what's the reason behind this?

mbison
  • 127
  • 4
  • 1
    Ok, seeing the downvotes, let me rephrase it. Is scipy's curve_fit algorithm deterministic for fitting sigmoids? I couldn't find information about it online. – mbison Jan 15 '19 at 15:04
  • 1
    It's likely just using a random seed, if you want it to be deterministic you would just need to specify your own seed. See here: https://stackoverflow.com/questions/16016959/scipy-stats-seed – Nick Chapman Jan 15 '19 at 15:09
  • 1
    Tried fixing seed (numpy.random.seed = 12). Then I'm trying to fit a sigmoid like this. popt, pcov = curve_fit(sigmoid, xs, ys, bounds=([-np.inf, 0, 0], [np.inf, np.inf, 1]), ftol=len(xs)*1e-6) b, mu, max_kr = popt Difference between 2 runs for mu value: 1292.3298349461788 1292.32983691704 – mbison Jan 15 '19 at 15:20
  • 3
    Can you create a [minimal, complete and verifiable example](https://stackoverflow.com/help/mcve) that demonstrates the issue? This would allow someone to copy and run the example, determine if it is reproducible and investigate the cause. – Warren Weckesser Jan 15 '19 at 19:22
  • Have you tried setting the seed every time before you run `fit_sigmoid`? – Nils Werner Jan 16 '19 at 15:58
  • @NilsWerner Added np.random.seed=12 before both fit_sigmoid calls just now to try it out (If I understand it correctly, that's what you're suggesting) Sadly, I still get varying results as an output: 15.1102041356 15.1102040471 – mbison Jan 16 '19 at 16:09
  • Wait, you're trying to fit two different sets of data and expect exactly the same result? How is this supposed to happen?! – Nils Werner Jan 16 '19 at 18:20
  • @NilsWerner They are the same points, only in different order. Why should the order of 2D points matter when fitting a curve, if they are the same points? – mbison Jan 17 '19 at 10:14
  • Ah I didnt see that. The solution is then to simply sort the data before fitting. – Nils Werner Jan 17 '19 at 10:53

1 Answers1

1

If you sort your data by x before running the curve fit algorithm, you will get reproducible results:

from scipy.optimize import curve_fit
import numpy as np

def sigmoid(x, b, mu, max_kr):
    if isinstance(x, list) or isinstance(x, np.ndarray):
        return [sigmoid(xx, b, mu, max_kr) for xx in x]
    else:
        return max_kr/(1+10**(mu*(-x+b)))

def fit_sigmoid(points):
    points = points[points[:, 0].argsort()]
    popt, pcov = curve_fit(sigmoid, points[:, 0], points[:, 1], bounds=([-np.inf, 0, 0],    [np.inf, np.inf, 1]), ftol=len(points)*1e-6)
    b, mu, max_kr = popt
    return mu

points1 = np.array([
    (4.0, 1.0),
    (1.0, 8.340850913002296e-05),
    (3.0, 0.9793319563421965),
    (0.0, 8.340850913002296e-05),
    (-1.0, 0.0),
    (2.0, 0.010306481917677357)
])
points2 = np.array([
    (4.0, 1.0),
    (-1.0, 0.0),
    (3.0, 0.9793319563421965),
    (0.0, 8.340850913002296e-05),
    (1.0, 8.340850913002296e-05),
    (2.0, 0.010306481917677357)
])
print(fit_sigmoid(points1))
print(fit_sigmoid(points2))
# 15.110203876634552
# 15.110203876634552
Nils Werner
  • 34,832
  • 7
  • 76
  • 98
  • 4
    while this does work the problem around, it doesn't seem to answer the poster's question on why the order matters for curve_fit – Kristóf Szalay Jan 17 '19 at 14:15