Scaling error and Information cirteria in lmfit

Question

I am using lmfit for solving a non-linear optimization problem. It works fine to the point where I am trying to implement a measurement error as the standard deviation of the dependent variable y (sigma_y). My problem is, that I cannot interpret the Information criteria properly. When implementing the return (model - y)/(sigma_y) they just raise from really low to very high values.

i.e. [left: return (model - y) -weighting-> right: return (model - y)/(sigma_y)]:

chi-square 0.00159805 -> 47.3184972
reduced chi-square 1.7756e-04 -> 5.25761080 expectation value is 1 || SO discussion
Akaike info crit -93.2055413 -> 20.0490661 the more negative, the better
Bayesian info crit -92.4097507 -> 20.8448566 the more negative, the better

My guess is, that this is somehow connected to bad usage of lmfit (wrong calculation of Information criteria, bad error scaling) or to a general lack of understanding these criteria (to me reduced chi-square of 5.258 (under-estimated) or 1.776e-4 (over-estimated) sounds like a really poor fit, but the plot of residuals etc. looks okay for me...)

Here is my example code that reproduces the problem:

import lmfit
import numpy as np
import matplotlib.pyplot as plt

y = np.array([1.42774324, 1.45919099, 1.58891696, 1.78432768, 1.97403125,
       2.17091161, 2.02065766, 1.83449279, 1.64412613, 1.47265405,
       1.4507    ])
sigma_y = np.array([0.00312633, 0.00391546, 0.00873894, 0.01252675, 0.00639111,
       0.00452488, 0.0050566 , 0.01127185, 0.01081748, 0.00227587,
       0.00190191])
x = np.array([0.02372331, 0.07251188, 0.50685822, 1.30761963, 2.10535442,
       2.90597497, 2.30733552, 1.50906939, 0.7098041 , 0.09580686,
       0.04777082])
offset = np.array([1.18151707, 1.1815602 , 1.18202847, 1.18187962, 1.18047493,
       1.17903493, 1.17962602, 1.18141625, 1.18216907, 1.18222051,
       1.18238949])
parameter = lmfit.Parameters()
parameter.add('par_1', value = 1e-5)
parameter.add('par_2', value = 0.18)

def U_reg(parameter, x, offset):
    par_1 = parameter['par_1']
    par_2 = parameter['par_2']
    model = offset + 0.03043211217 * np.arcsinh(x / (2 * par_1)) + par_2 * x
    return (model - y)/(sigma_y)

mini = lmfit.Minimizer(U_reg, parameter, fcn_args=(x, offset), nan_policy='omit', scale_covar=False)
regr1 = mini.minimize(method='least_sq') #Levenberg-Marquardt
print("Levenberg-Marquardt:")
print(lmfit.fit_report(regr1, min_correl=0.9))

def U_plot(parameter, x, offset):
    par_1 = parameter['par_1']
    par_2 = parameter['par_2']
    model = offset + 0.03043211217 * np.arcsinh(x / (2 * par_1)) + par_2 * x
    return model - y

plt.title("Model vs Data")
plt.plot(x, y, 'b')
plt.plot(x, U_plot(regr1.params, x, offset) + y, 'r--', label='Levenberg-Marquardt')
plt.ylabel("y")
plt.xlabel("x")
plt.legend(loc='best')
plt.show()

I know that it might be more convient to implement weighting via lmfit.Model, due to multiple regressors I want to keep the lmfit.Minimizer method. My python version is 3.8.5, lmfit is 1.0.2

score 1 · Accepted Answer · answered Jun 12 '21 at 02:48

1

Well, in order for the magnitude of chi-square to be meaningful (for example, that it be around (N_data - N_varys), the scale of the uncertainties has to be correct -- giving the 1-sigma standard deviation for each observation.

It's not really possible for lmfit to detect whether the sigma you use has the right scale.

answered Jun 12 '21 at 02:48

M Newville

7,486
2
16
29

Currently the 1-sigma standard deviation is calculated by `pandas.DataFrame.std` with a normalization by N-1. Prior to this, I also prepare the data to approximate static behavior, that´s why the [standard deviations](https://imgur.com/GVCDWra) are pretty low. Right now [unweighted Fit](https://imgur.com/Tuu9uYZ) clearly outruns [weighted Fit](https://imgur.com/LarHSi5), so I guess that I have to accept it and fit with `return (model - y)` when deviations tend to zero? – Fabian Pascher Jun 12 '21 at 12:56
@FabianPascher what does "with a Normalization by N-1" mean? You almost certainly want to scale "model-data" by the uncertainties in the data. Using the 1-sigma "standard deviation" of the data is a fine approach. There is no N-1 in that. It's just (and exactly) the square-root of the variance. I also have no idea what "now unweighted Fit clearly outruns weighted Fit" means. You mean in the sense that one is faster than the other? – M Newville Jun 12 '21 at 18:16
@M Newville You can divide the _sum(x_i - my)^2_-numerator in the variance either by N (population) or by N-1 (sample - also called "unbiased estimator of standard deviation"), with N as the number of measurements. Since my groups are very large in N, this doesn't make a big [difference](https://jpst.it/2xeRZ) - and was a bit misleading to mention... _You mean in the sense that one is faster than the other?_ - No, I meant that 1-sigma approach with the given sigma_y led to worse regression results (i.e. information criteria, plots I posted in the comment before). – Fabian Pascher Jun 13 '21 at 09:54
@M Newville I am currently searching for different approaches to improve the regression result of my example code. My "reference" is the `return (model - y)` which seems to return the best set of parameters (with sadly really small reduced chi-square **1.8e-4**). Applying weights like 1-sigma leads to really high reduced chi-square **5.3**, which can be compensated by 2-sigma, but somehow feels like cheating and only corrects reduced chi-square **1.3**, while Akaike **4.8** and Bayesian **5.6** are still positive. The 2-sigma parity plot stays basically the same as 1-sigma. – Fabian Pascher Jun 13 '21 at 11:04
Here are some parity plots of [no-weight](https://imgur.com/gLFerkr), [1-sigma](https://imgur.com/RYcnLRh) and [2-sigma](https://imgur.com/naJKEZ1) – Fabian Pascher Jun 13 '21 at 11:04
Demanding that reduced chi-square be very close to 1 (say, being between 0.5 and 1.5) requires two things: a) that the model is a perfectly accurate description of your data, including that there are no systematic errors in your data -- that all the variance in the data is statistical fluctuation, and b) that you have a very accurate estimate of those 1-sigma fluctuations. To be clear, I don't mean that those should be true "in principle", but specifically for the actual data. For the kinds of measurements I work with, neither of these is ever true. – M Newville Jun 13 '21 at 19:55
Your plots all look about the same to me -- should one of these be "obviously better"? If you want to convey that, you'll have to include error bars for each value. – M Newville Jun 13 '21 at 19:58
To _a)_: After data preparation there might be heteroscedasticity left in some of the regressors that define the offset and the dependent variable y, there are some groups that fail hypothesis testing (Shapiro, Anderson, D'Agostino, etc. | alpha = 0.05) with normal distribution, weak multicollinearity might also have a small impact. To _b)_: Each observation contains around 110 measurements, 70 % of the observations are tested as normal distributed (30 % as Gaussian-like or complete fails). So 1-sigma should be approximately right. – Fabian Pascher Jun 14 '21 at 08:43
According to Information criteria, the non-weighted fit is much better (negative Akaike & Bayesian, reduced chi-square of 1.7756e-04 ). Statistics of the parity plot show small improvements when using no-weights instead of 1-sigma (identical result to 2-sigma) weights. MAE becomes 0.002 smaller, RMSE 0.014 and RMSE/Std(y) decreases by 0.01 when fitting on the non-weighted `return (model - y)`. Here are the plots with 1-sigma error bars: [unweighted](https://imgur.com/YlPfjXi) & [1-sigma weighted](https://imgur.com/XffNr7H). – Fabian Pascher Jun 14 '21 at 09:33
To my mind, non-weighted is better than 1-sigma weighted. But I can be wrong, here are some additional plots of the residuals: [unweighted fit](https://imgur.com/6TUdTst) & [1-sigma weighted fit](https://imgur.com/TgRfLJk) (error bars in 1-sigma weighted not really visible due to high residuals). – Fabian Pascher Jun 14 '21 at 09:41
"non-weighting" is weighting by 1.0. If you have a better estimate of the uncertainties in the data than "all uncertainties have the value 1.0", you should use that. If weighting by 1.0 gives a better result, then maybe your careful and precise statistics are simply wrong. If a plot of the data with error bars and the best fit model shows systematic disagreement far outside the error bars then all the careful and precise statistics will not really improve the obvious conclusion of "not a good fit". – M Newville Jun 14 '21 at 16:46

Scaling error and Information cirteria in lmfit

1 Answers1