I am confused about "newton-cg" and "newton-cholesky" explanations in different sources. According to the sklearn documentaion
The “newton-cholesky” solver is an exact Newton solver that calculates the Hessian matrix and solves the resulting linear system.
While after reading wonderful answer about LogisticRegression solvers I thought that this is what exactly "newton-cg" does.
Source about comparison of different methods provided by sklearn documentation also did not really clarify things.
So what is the difference between them?
EDIT 1
I know that GPT answers are banned for now, but I think I found the answer. According to GPT4:
The "newton-cg" solver uses a Newton-CG conjugate gradient algorithm to minimize the objective function. This algorithm uses the gradient and the Hessian matrix (second-order derivative of the objective function) to iteratively find the minimum of the function. The Hessian matrix is approximated using the outer product of the gradient vector. This solver works well for small and medium-sized datasets.
On the other hand, the "newton-cholesky" solver uses a Newton-CG algorithm with a Cholesky decomposition of the Hessian matrix. This solver is more efficient than "newton-cg" for large datasets, as it requires less memory and computational resources. However, it may not converge for some datasets.
In summary, the "newton-cg" solver may be more accurate, but slower and less memory efficient for large datasets, while the "newton-cholesky" solver is faster and more memory efficient, but may not converge for some datasets. The choice between these solvers depends on the size and complexity of the dataset and the trade-off between accuracy and computational efficiency.
To me, it makes sense because it finely aligns with all provided sources. The ambiguity in sklearn documentation was mainly in the "calculate" part - they do not explicitly tell, how Hessian is calculated in the 'newton-cholesky' method and how it is done in the 'newton-cg' method at all.