1

I am trying to solve one problem that resembles that of Fisher's irises classification. The problem is that I can train the model on my computer, but the given model has to predict class membership on a computer where it is impossible to install python and scikit learn. I want to understand how, having received the coefficients of the logistic regression model, I can predict the belonging to a certain class without using the predict method of the model. Using the Fisher problem as an example, I do the following.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score, f1_score

# data preparation
iris = load_iris()
data = pd.DataFrame(data=np.hstack([iris.data, iris.target[:, np.newaxis]]), 
                    columns=iris.feature_names + ['target'])
names = data.columns

# split data
X_train, X_test, y_train, y_test = train_test_split(data[names[:-1]], data[names[-1]], random_state=42)

# train model
cls = make_pipeline(
    StandardScaler(),
    LogisticRegression(C=2, random_state=42)
)
cls = cls.fit(X_train.to_numpy(), y_train)
preds_train = cls.predict(X_train)

# prediction
preds_test = cls.predict(X_test)

# scores
train_score = accuracy_score(preds_train, y_train), f1_score(preds_train, y_train, average='macro') # on train data
# train_score = (0.9642857142857143, 0.9653621232568601)
test_score = accuracy_score(preds_test, y_test), f1_score(preds_test, y_test, average='macro') # on test data
# test_score = (1.0, 1.0)

# model coefficients
cls[1].coef_, cls[1].intercept_

>>> (array([[-1.13948079,  1.30623841, -2.21496793, -2.05617771],
            [ 0.66515676, -0.2541143 , -0.55819748, -0.86441227],
            [ 0.47432404, -1.05212411,  2.77316541,  2.92058998]]),
      array([-0.35860337,  2.43929019, -2.08068682]))

Now I have the coefficients of the model. And I want to use them to make predictions. First, I make a prediction using the predict method for the first five observations on the test sample.

preds_test = cls.predict_proba(X_test)
preds_test[0:5]

>>>array([[5.66019001e-03, 9.18455687e-01, 7.58841233e-02],
          [9.75854479e-01, 2.41455095e-02, 1.10881450e-08],
          [1.18780156e-09, 6.53295166e-04, 9.99346704e-01],
          [6.71574900e-03, 8.14174200e-01, 1.79110051e-01],
          [6.98756622e-04, 8.09096425e-01, 1.90204818e-01]])

Then I manually calculate the predictions of the class probabilities for the observations using the coefficients of the model.

# define two functions for making predictions
def logit(x, w):
    return np.dot(x, w)

# from here: https://stackoverflow.com/questions/34968722/how-to-implement-the-softmax-function-in-python
def softmax(z):
    assert len(z.shape) == 2
    s = np.max(z, axis=1)
    s = s[:, np.newaxis] # necessary step to do broadcasting
    e_x = np.exp(z - s)
    div = np.sum(e_x, axis=1)
    div = div[:, np.newaxis] # dito
    return e_x / div

n, k = X_test.shape
X_ = np.hstack((np.ones((n, 1)), X_test)) # add column with 1 for intercept
weights = np.hstack((cls[1].intercept_[:, np.newaxis], cls[1].coef_)) # create weights matrix
results = softmax(logit(X_, weights.T)) # calculate probabilities 
results[0:5]

>>>array([[3.67343725e-14, 4.63938438e-06, 9.99995361e-01],
          [2.81976786e-05, 8.63083152e-01, 1.36888650e-01],
          [1.24572182e-22, 5.47800683e-11, 1.00000000e+00],
          [3.32990060e-14, 3.08352323e-06, 9.99996916e-01],
          [2.66415118e-15, 1.78252465e-06, 9.99998217e-01]])

If you compare the two results obtained (preds_test[0:5] and results[0:5]), you can see that they do not coincide at all. Please explain me what I am doing wrong and how I can use the model's coefficients to calculate predictions without using the predict method.

2 Answers2

1

I forgot that a scaler was applied. If you change the code a little, then the results are the same.

scaler = StandardScaler()
scaler.fit(X_train)
X_test_transf = scaler.transform(X_test)

def logit(x, w):
    return np.dot(x, w)

def softmax(z):
    assert len(z.shape) == 2
    s = np.max(z, axis=1)
    s = s[:, np.newaxis] # necessary step to do broadcasting
    e_x = np.exp(z - s)
    div = np.sum(e_x, axis=1)
    div = div[:, np.newaxis] # dito
    return e_x / div

n, k = X_test_transf.shape
X_ = np.hstack((np.ones((n, 1)), X_test_transf))
weights = np.hstack((cls[1].intercept_[:, np.newaxis], cls[1].coef_))
results = softmax(logit(X_, weights.T))
np.allclose(preds_test, results)

>>>True
0

There are two values for every predict_proba. The first value is the probability of the event not occurring and the probability of the event occurring. predict_proba(X)[:,1] to get the probability of the event occurring.

Golden Lion
  • 3,840
  • 2
  • 26
  • 35