1

I have some data with three columns, x, y, and b. Here, b has associated values for each x value. (N.b. x always increases)

That is, the data looks something like:

x y b
1 4 7
2 5 8
3 6 9

Say this data is fit by some model y = ax + math.log(b). I could compute each value by doing something like:

def findB(x):
    for i in range(0, len(xData)):
        if xData[i] >= x:
            return bData[i]

def f(x, a):
    b = findB(x)
    result = a*x + math.log(b)

    return result

Now say I want to optimise this fit for f(x,a) to find the parameter a using scipy.optimise.curve_fit.

I have attempted to do so like this:

popt, pcov = scipy.optimize.curve_fit(f, xData, yData)

but doing so results in the error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

This is due to the curve_fit function passing all the x data as one array resulting in the if xData[i] >= x: statement being rendered ambiguous.

I can't figure out a away around this. I have tried creating a paired list of xData and bData and passing this to curve_fit to no avail.

This works if I simply hardcode b to some pre-determined value so I believe I am calling the curve_fit function correctly. Advice welcome.

Samidamaru
  • 1,061
  • 7
  • 18
  • Thanks, Leporello, I hadn't spotted that question and the solution there seems much more elegant,. – Samidamaru Jul 03 '19 at 11:58
  • Can you describe what `findB` is supposed to do in words? It's evolved from a single loop to a nested loop now, and frankly, neither one makes sense. – Mad Physicist Jul 03 '19 at 12:00
  • The problem is that `curve_fit` expects an array function, and `if` cannot be used on arrays, for that you use a mask. Same problem in `math.log` (it expects scalars, not vectors: use `np.log` instead). In your particular case, since all you are doing is the same as an excel VLOOKUP, and assuming your array is sorted (as in your example) you can get the `i` index using `np.searchsorted`: `def findB(x): return bData[np.searchsorted(xData, x)]` – Tarifazo Jul 03 '19 at 12:07

1 Answers1

1

In discussion with some colleagues, I think we found the solution.

Using a numpy array for the values of b and replacing math.log with np.log seems to solve this:

def findB(x):
    tempArray = np.zeros(len(x))

    for value in x:
        for i in range(0, len(xData)):
            if xData[i] >= value:
                tempArray[i] = bData[i]

    return tempArray
def f(x, a):
    b = findB(x)
    result = a*x + np.log(b)

    return result
Samidamaru
  • 1,061
  • 7
  • 18