Determening begin parameters 2D gaussian fit

Question

I'm working on some code which needs to be able to preform a 2d gaussian fitting. I mostly based my code on following question: Fitting a 2D Gaussian function using scipy.optimize.curve_fit - ValueError and minpack.error . Now is problem that I don't really have an initial guess about the different parameters that need to be used.

I've tried this:

def twoD_Gaussian(x_data_tuple, amplitude, xo, yo, sigma_x, sigma_y, theta, offset):
    (x,y) = x_data_tuple
    xo = float(xo)
    yo = float(yo)    
    a = (np.cos(theta)**2)/(2*sigma_x**2) + (np.sin(theta)**2)/(2*sigma_y**2)
    b = -(np.sin(2*theta))/(4*sigma_x**2) + (np.sin(2*theta))/(4*sigma_y**2)
    c = (np.sin(theta)**2)/(2*sigma_x**2) + (np.cos(theta)**2)/(2*sigma_y**2)
    g = offset + amplitude*np.exp( - (a*((x-xo)**2) + 2*b*(x-xo)*(y-yo) 
                            + c*((y-yo)**2)))
    return g.ravel()

The data.reshape(201,201) is just something I took from the aformentioned question.

mean_gauss_x = sum(x * data.reshape(201,201)) / sum(data.reshape(201,201))
sigma_gauss_x = np.sqrt(sum(data.reshape(201,201) * (x - mean_gauss_x)**2) / sum(data.reshape(201,201)))

mean_gauss_y = sum(y * data.reshape(201,201)) / sum(data.reshape(201,201))
sigma_gauss_y = np.sqrt(sum(data.reshape(201,201) * (y - mean_gauss_y)**2) / sum(data.reshape(201,201)))


initial_guess = (np.max(data), mean_gauss_x, mean_gauss_y, sigma_gauss_x, sigma_gauss_y,0,10)


popt, pcov = curve_fit(twoD_Gaussian, (x, y), data, p0=initial_guess)

data_fitted = twoD_Gaussian((x, y), *popt)

If I try this, I get following error message: ValueError: setting an array element with a sequence.

Is the reasoning about the begin parameters correct? And why do I get this error?

score 1 · Accepted Answer · answered Aug 13 '19 at 15:19

If I use the runnable code from the linked question and substitute your definition of initial_guess:

mean_gauss_x = sum(x * data.reshape(201,201)) / sum(data.reshape(201,201))
sigma_gauss_x = np.sqrt(sum(data.reshape(201,201) * (x - mean_gauss_x)**2) / sum(data.reshape(201,201)))

mean_gauss_y = sum(y * data.reshape(201,201)) / sum(data.reshape(201,201))
sigma_gauss_y = np.sqrt(sum(data.reshape(201,201) * (y - mean_gauss_y)**2) / sum(data.reshape(201,201)))

initial_guess = (np.max(data), mean_gauss_x, mean_gauss_y, sigma_gauss_x, sigma_gauss_y,0,10)

Then

print(inital_guess)

yields

(13.0, array([...]), array([...]), array([...]), array([...]), 0, 10)

Notice that some of the values in initial_guess are arrays. The optimize.curve_fit function expects initial_guess to be a tuple of scalars. This is the source of the problem.

The error message

ValueError: setting an array element with a sequence

often arises when an array-like is supplied when a scalar value is expected. It is a hint that the source of the problem may have to do with an array having the wrong number of dimensions. For example, it might arise if you pass a 1D array to a function that expects a scalar.

Let's look at this piece of code taken from the linked question:

x = np.linspace(0, 200, 201)
y = np.linspace(0, 200, 201)
X, Y = np.meshgrid(x, y)

x and y are 1D arrays, while X and Y are 2D arrays. (I've capitalized all 2D arrays to help distinguish them from 1D arrays).

Now notice that Python sum and NumPy's sum method behave differently when applied to 2D arrays:

In [146]: sum(X)
Out[146]: 
array([    0.,   201.,   402.,   603.,   804.,  1005.,  1206.,  1407.,
        1608.,  1809.,  2010.,  2211.,  2412.,  2613.,  2814.,  3015.,
        ...
       38592., 38793., 38994., 39195., 39396., 39597., 39798., 39999.,
       40200.])

In [147]: X.sum()
Out[147]: 4040100.0

The Python sum function is equivalent to

total = 0
for item in X:
    total += item

Since X is a 2D array, the loop for item in X is iterating over the rows of X. Each item is therefore a 1D array representing a row of X. Thus, total ends up being a 1D array.

In contrast, X.sum() sums all the elements in X and returns a scalar.

Since initial_guess should be a tuple of scalars, everywhere you use sum you should instead use the NumPy sum method. For example, replace

mean_gauss_x = sum(x * data) / sum(data)

with

mean_gauss_x = (X * DATA).sum() / (DATA.sum())

import numpy as np
import scipy.optimize as optimize
import matplotlib.pyplot as plt

# define model function and pass independant variables x and y as a list
def twoD_Gaussian(data, amplitude, xo, yo, sigma_x, sigma_y, theta, offset):
    X, Y = data
    xo = float(xo)
    yo = float(yo)
    a = (np.cos(theta) ** 2) / (2 * sigma_x ** 2) + (np.sin(theta) ** 2) / (
        2 * sigma_y ** 2
    )
    b = -(np.sin(2 * theta)) / (4 * sigma_x ** 2) + (np.sin(2 * theta)) / (
        4 * sigma_y ** 2
    )
    c = (np.sin(theta) ** 2) / (2 * sigma_x ** 2) + (np.cos(theta) ** 2) / (
        2 * sigma_y ** 2
    )
    g = offset + amplitude * np.exp(
        -(a * ((X - xo) ** 2) + 2 * b * (X - xo) * (Y - yo) + c * ((Y - yo) ** 2))
    )
    return g.ravel()


# Create x and y indices
x = np.linspace(0, 200, 201)
y = np.linspace(0, 200, 201)
X, Y = np.meshgrid(x, y)

# create data
data = twoD_Gaussian((X, Y), 3, 100, 100, 20, 40, 0, 10)
data_noisy = data + 0.2 * np.random.normal(size=data.shape)
DATA = data.reshape(201, 201)


# add some noise to the data and try to fit the data generated beforehand
mean_gauss_x = (X * DATA).sum() / (DATA.sum())
sigma_gauss_x = np.sqrt((DATA * (X - mean_gauss_x) ** 2).sum() / (DATA.sum()))

mean_gauss_y = (Y * DATA).sum() / (DATA.sum())
sigma_gauss_y = np.sqrt((DATA * (Y - mean_gauss_y) ** 2).sum() / (DATA.sum()))


initial_guess = (
    np.max(data),
    mean_gauss_x,
    mean_gauss_y,
    sigma_gauss_x,
    sigma_gauss_y,
    0,
    10,
)
print(initial_guess)
# (13.0, 100.00000000000001, 100.00000000000001, 57.106515650488404, 57.43620227324201, 0, 10)
# initial_guess = (3,100,100,20,40,0,10)

popt, pcov = optimize.curve_fit(twoD_Gaussian, (X, Y), data_noisy, p0=initial_guess)

data_fitted = twoD_Gaussian((X, Y), *popt)

fig, ax = plt.subplots(1, 1)
ax.imshow(
    data_noisy.reshape(201, 201),
    cmap=plt.cm.jet,
    origin="bottom",
    extent=(X.min(), X.max(), Y.min(), Y.max()),
)
ax.contour(X, Y, data_fitted.reshape(201, 201), 8, colors="w")
plt.show()

Hey! Thanks a lot for the answer, the code seems to work! I'll keep in mind the difference between sum() and np.sum() — Wouter Blokland, Aug 14 '19 at 06:43

Determening begin parameters 2D gaussian fit

1 Answers1