7

I'm using least_squares optimization to adjust the output of a numerical model to some measured data. At this stage I'm wondering how to determine appropriate values for the ftol, xtol and gtol parameters as they determine how and where the optimization will stop. To me it seems that these parameters fit nicely into the framework of the algorithms but I find it difficult to connect them to real world properties.

For example I have uncertainty estimates for my measured data, so once the optimizer reaches sufficient agreement between the model's output and the measured data within the limits of the uncertainty, it is reasonable to stop (i.e. np.all(np.abs(model_output - measured_data) < uncertainty)). However it doesn't seem this can be expressed with the ftol parameter. The termination condition is dF < ftol * F (where F is the sum of squared residuals and dF is its change), so even though I could compute ftol such as to prevent updates smaller than the uncertainty once F has been reached within that level, I also risk premature termination somewhere far off the desired solution. In the end it's up to the optimizer how large a step it wants to take at each iteration (and thus determines dF), so dF might be small compared to F even though it's far off the desired solution.

Another aspect is the change of parameters values. In the end, the resulting values obtained from the optimization will be used to adjust some real devices and these have a finite precision. So for example the devices won't distinguish values that differ by less than 1e-6. This means that again, once sufficient agreement between the model's output and the measured data has been achieved, any parameter update of less than 1e-6 is not meaningful. On the other hand, many small updates of < 1e-6 could sum up to a larger overall update > 1e-6 and I'm back at the same problem: it's up to the optimizer how large a step it wants to take and restricting this, I fear that I risk premature termination. In addition the xtol parameter again only describes a scaling factor between the parameter update and the current values. While I could use some value that reflects the precision of the devices around the expected final parameter values, I've seen the optimizer reaching intermediate parameter values two orders of magnitude larger than the final estimate, so this clearly risks premature termination.

While I find it difficult to choose appropriate values for the ftol, xtol and gtol parameters, it's equally unsatisfying to leave them at their default values without good arguments, since that implies agreement with their default values being reasonable.

a_guest
  • 34,165
  • 12
  • 64
  • 118
  • 1
    Since this hasn't had any activity and isn't really a code question, you might want to consider one of the maths/comp-sci forums at https://stackexchange.com/sites#science – mdurant Feb 01 '22 at 16:32
  • @mdurant I was hoping that someone who has used this function as well might share their thoughts / insights about what they chose for the parameters and I thought that there are more *users* on SO than on stats or other sites. In the end, it's also about the specific implementation and why they chose to expose these three different stopping criteria and why the specific default values were chosen. But if the question doesn't receive an answer here, I will ask it again at one of the other SE sites. – a_guest Feb 01 '22 at 18:19
  • 1
    My experience with fitting is similar to yours: sometimes the defaults work well, sometimes not, and it depends strongly on the algorithm chosen. I fear that might be the common scientists/coders' experience! – mdurant Feb 01 '22 at 21:06

1 Answers1

1

The choice of ftol, xtol, and gtol are related to speed of convergence of the specific optimization problem. Assuming a solution x_min+err was found, where err is the deviation from the true value x_min, I like to think about the tolerance in the following (simplified) way:

ftol requires some insight on the shape of F around the minimum. I.e., how much does a change in the value of F change x_min. More precisely, if |F(x_min+err)/F(x_min) - 1| < gtol then err is negligibly small compared to x_min. Of course this depends strongly of the properties F(x).

xtol is the relative accuracy of x_min. I.e., |(x_min+err) / x_min - 1| < xtol.

Values (of the infinity norm of the gradient) below gtol are interpreted as zero gradient (i.e., a stationary point). This also requires some insight about the shape of F and the noise.

I know that those are fairly unspecific statements. Everything resolves around the shape of F(x) in the neighborhood of the minimum x_min. Please note there are a lot more factors to consider, when using non-linear least squares methods. I.e., it is useful to reason about the choice estimation algorithm. In other words: Are the statistical properties of the uncertainties (caused by measurement noise, model deviation, numerics, etc) so well-behaved that a meaningful estimate can be expected (biasedness and consistency are the main characteristics).

Dietrich
  • 5,241
  • 3
  • 24
  • 36