The libraries statsmodels and sklearn produce different values of the log-loss function. A toy example:
import pandas as pd
import statsmodels.api as sm
from sklearn.metrics import log_loss
df = pd.DataFrame(
columns=['y','x1','x2'],
data=[
[1,3,5],
[1,-2,7],
[0,-1,-5],
[0,2,3],
[0,3,5],
])
logit = sm.Logit(df.y,df.drop(columns=['y']))
res = logit.fit()
The result of res.llf
is -1.386294361119906, while the result of -log_loss(df.y,res.fittedvalues)
is -6.907755278982137. Shouldn't they be equal (up to a small difference due to different numerical implementations)? The statsmodels documentation says that .llf
is the log likelihood of the model and as this question and this Kaggle post point out, log_loss is just the negative of the log likelihood.
Package versions: scikit-learn==1.0.1
, statsmodels==0.13.5