You are mistaking between hyper-parameters and parameters. All scikit-learn estimators which have CV in the end, like LogisticRegressionCV
, GridSearchCV
, or RandomizedSearchCV
tune the hyper-parameters.
Hyper-parameters are not learnt from training on the data. They are set prior to learning assuming that they will contribute to optimal learning. More information is present here:
Hyper-parameters are parameters that are not directly learnt within
estimators. In scikit-learn they are passed as arguments to the
constructor of the estimator classes. Typical examples include C,
kernel and gamma for Support Vector Classifier, alpha for Lasso, etc.
In case of LogisticRegression, C
is a hyper-parameter which describes the inverse of regularization strength. The higher the C, the less regularization is applied on the training. Its not that C
will be changed during training. It will be fixed.
Now coming to coef_
. coef_
contains coefficient (also called weights) of the features, which are learnt (and updated) during the training. Now depending on the value of C (and other hyper-parameters present in contructor), these can vary during the training.
Now there is another topic on how to get the optimum initial values of coef_
, so that the training is faster and better. Thats optimization. Some start with random weights between 0-1, others start with 0, etc etc. But for the scope of your question, that is not relevant. LogisticRegressionCV is not used for that.
This is what LogisticRegressionCV does:
- Get the values of different
C
from constructor (In your example you passed 1.0).
- For each value of
C
, do the cross-validation of supplied data, in which the LogisticRegression will be fit()
on training data of the current fold, and scored on the test data. The scores from test data of all folds are averaged and that becomes the score of the current C
. This is done for all C
values you provided, and the C
with the highest average score will be chosen.
- Now the chosen
C
is set as the final C
and LogisticRegression is again trained (by calling fit()
) on the whole data (Xdata,ylabels
here).
Thats what all the hyper-parameter tuners do, be it GridSearchCV, or LogisticRegressionCV, or LassoCV etc.
The initializing and updating of coef_
feature weights is done inside the fit()
function of the algorithm which is out of scope for the hyper-parameter tuning. That optimization part is dependent on the internal optimization algorithm of the process. For example solver
param in case of LogisticRegression
.
Hope this makes things clear. Feel free to ask if still any doubt.