R nleqslv difficulties - solving for pH in an acid-base buffer

Question

Goal

Build a theoretical titration curve for the phosphoric acid buffer (1M).

I provide a fully reproducible and self-contained example (of my failures ^.^).

Model equations

Acid-base equilibrium equations for phosphoric acid are:

Model implementation

Ka.1 <- 7.1 * 10^-3 
Ka.2 <- 6.3 * 10^-8
Ka.3 <- 4.5 * 10^-13
Kw <- 10^-14

balance <- function(vars, Na_ca, P_ca, convert.fun=function(x) x){
  # Apply positive only constraint
  vars <- convert.fun(vars)
  
  H <- vars[1]
  H3A <- vars[2]
  H2A <- vars[3]
  HA <- vars[4]
  A <- vars[5]
  Na <- convert.fun(Na_ca)

  eq.system <- c(H3A + H2A + HA + A - P_ca, 
                 H + Na - Kw/H - H2A - 2*HA - 3*A, 
                 H * H2A / Ka.1 - H3A,
                 H * HA  / Ka.2 - H2A, 
                 H * A   / Ka.3 - HA
                 )
  return(eq.system)
}

Notice that convert.fun is there to try different ways of forcing positive values on concentrations.

The return value is a vector of the model's equations, equated to zero (is this right?).

Iteration

I wished to solve the system for all possible Na+ concentrations, up to 3 equivalence "volumes".

I set initial conditions that woked for the lowest one: [Na]=0.

Then solved it with nleqslv and used the result to "seed" the next iteration.

And it seemed to work nicely:

But, on close inspection, the issues will become obvious.

But, before that, some code!

Setup initial conditions and results matrix:

P_ca <- 1
ci.start <- c(H=10^-1, H3A=0.9, H2A=0.1, HA=0.1, A=0.1)
Na.seq <- seq(from=0,to=3*P_ca,by=P_ca/1000)

varnames <- c("Na", "H",  "H3A", "H2A", "HA",  "A")

result.m <- matrix(ncol = length(varnames), nrow = length(Na.seq))
colnames(result.m) <- varnames

result.m[,1] <- Na.seq

Iteration:

convert.fun <- function(x) abs(x)

for(i in 1:length(Na.seq)){
  
  Na_ca <- result.m[i,1]
  
  if(i == 1){                   # Si es la primera iteración,
    ci <- ci.start              # usar los valores "start" como C.I.
  } else {                      # Si no,
    ci <- result.m[i-1, 2:6]    # usar los valores de la solución anterior
  }
  
  result <- nleqslv::nleqslv(x = ci, 
                             fn = balance, 
                             Na = Na_ca, P = P_ca,
                             convert.fun = convert.fun,
                             method="Newton",  #method="Broyden",
                             global="dbldog",
                             control = list(allowSingular=TRUE,
                                            maxit=1000))
  
  result$x <- convert.fun(result$x)
  
  result.m[i,2:6] <- result$x
  
  stopifnot(all(result$x >= 0))
  
} # END LOOP

result.df <- as.data.frame(result.m)

Notice that convert.fun is now abs(x) (is this ok?).

The problem

The problem with the last plot is that the right part of it is flattened out.

The problem is even more obvious in the following plot:

The red curve is supossed to end up at the top, and the purple one at the bottom. This seems to start happening at Na~2, but after a few more iterations, the result flattens out (and becomes exactly constant).

Possible hints for the savvy

The problem is a bit worse using method="Broyden" instead of "Newton".
nleqslv's return message changes from "Function criterion near zero" to "x-values within tolerance 'xtol'".
I also tried adding a Jacobian. That didnt change the result, but at the problematic point I get something like this:

    Chkjac  possible error in jacobian[2,1] =  2.7598836276240e+06
                             Estimated[2,1] =  1.1104869955110e+04

I am now really out of ideas! And would really appreciate some help or guidance.

Bhas · Accepted Answer · 2021-03-31T14:04:08.030

You should always test the termination code of nleqslv to determine if a solution has been found. And somehow display the termination code and/or the message nleqslv returns. You will see that in some case no better point was found. Therefore any result is invalid and useless.

You are using so many values for Na.seq that it is impossible to the wood through the trees.

I would suggest starting with a very limited set of values for Na.seq. Something like

Na.seq <- seq(from=0,to=3*P_ca,by=P_ca/10)

and also this to include the termination code in the result

varnames <- c("Na", "H",  "H3A", "H2A", "HA",  "A", "termcd")

result.m <- matrix(ncol = length(varnames), nrow = length(Na.seq))

And then change the iteration loop to this

for(i in 1:length(Na.seq)){

  Na_ca <- result.m[i,1]

  if(i == 1){                   # Si es la primera iteración,
    ci <- ci.start              # usar los valores "start" como C.I.
  } else {                      # Si no,
    ci <- result.m[i-1, 2:6]    # usar los valores de la solución anterior
  }

  iter.trace <- 1
  cat("Iteration ",i,"\n\n")
  result <- nleqslv::nleqslv(x = ci, 
                             fn = balance, 
                             Na = Na_ca, P = P_ca,
                             convert.fun = convert.fun,
                             method="Newton",  #method="Broyden",
                             global="dbldog",
                             control = list(allowSingular=TRUE,
                                            maxit=1000,trace=iter.trace))
  cat("\n\n ",result$message,"\n\n") 
  result$x <- convert.fun(result$x)

  result.m[i,2:6] <- result$x
  result.m[i,7] <- result$termcd

  stopifnot(all(result$x >= 0))

} # END LOOP

and start analysing the output to find out what the problem is and where.

Addendum

I am reasonably sure that the difficulties with solving are (partly) caused by numerical difficulties. With the above modifications I changed the values for Ka.1, Ka.2, Ka.3,and Kw to

Ka.1 <- 7.1 * 10^-1
Ka.2 <- 6.3 * 10^-3
Ka.3 <- 4.5 * 10^-3
Kw <- 10^-3

and then there are no problems in finding a solution (all termination codes are 1). I suspect that the very small values for the K... constants are the cause of the problem. Check the system for possible errors or try to change the measurement units of the variables.

Thank you Bhas! I have tried your variants (and included changing scalex, see plots here: https://imgur.com/a/pkMmggr ). Next, I'll follow your suggestion and try to transform the system somehow, to avoid small/large numbers. Do you happen to know what scalex really did? — Naiky, Mar 31 '21 at 16:24
`nleqslv` works with scaled variables. A user function works with unscaled variables. The scale factors are the inverse of the size of the variables. Default is `1`. — Bhas, Mar 31 '21 at 18:20
Thanks! Yep i read the documentation, i was struggling to understand what effect it might have on numerical problems. I lowered the Ka's as suggested, but [the problem persists](https://imgur.com/a/dgyJCCD) (but it is no longer constant!). As you said, numerical problems really seem to be the cause, because by just reordering [one expression](https://imgur.com/a/o5uUyUE) i get something better (i was inspired by [this answer](https://stackoverflow.com/a/49265217/11524079)). — Naiky, Mar 31 '21 at 19:36
[It finally works](https://imgur.com/a/UIyHDL0)! Thank you! ¿Should i accept your answer and make second one for the details? The key changes was setting `scalex=1/ci` specifying the Jacobian, and tuning `xtol=1e-10` and `maxit=1000`. — Naiky, Mar 31 '21 at 21:56

Naiky · Answer 2 · 2021-04-01T00:33:50.707

Solution details

Find details and full code at this repo.

The numerical method worked, and the analytical answer provided at chemistry stackexchange happily coincides :)

Sadly it does not match experimental data from Julia Martín et al (DOI 10.20431/2349-0403.0409002). Perhaps I'll post a question about it on chemistry stackexchange.

My thanks to everyone who helped out <3

Lastly, important plots from the numerical simulation: