0

I ran a Logit model using stats.models and declared a series with predicted values:

M1 = sm.Logit(y_train, X_train)
M1_results = M1.fit()
y_pred = M1_results.predict(X_train)  # This returns a series

y_pred is a series with values between 0 and 1. I want to overwrite its values conditionally by comparing them to an arbitrary cutoff.

Basically, if the i-th element of M1_pred <= 0.7, overwrite with 0. Otherwise, overwrite with 1.

I tried combining a for and an if loop together:

for i in y_pred:
    if i <= 0.7:
        i = 0
    else:
        i = 1

How come this didn't overwrite any of the values in y_pred?

I had to resort to slicing (as suggested here):

y_pred[y_pred <= 0.7] = 0
y_pred[y_pred >  0.7] = 1

This will be inconvenient when I move onto multiclass models. How can I achieve the same result using for and if notation?

PS: Excuse my ignorance. I recently moved from R to Python and everything is really confusing.

Arturo Sbr
  • 5,567
  • 4
  • 38
  • 76

2 Answers2

1

If y_pred is an instance of list you can use enumerate function to iterate over list with indexes. This will give you a possibility to set value of item by it's index in list.

Code:

for i, item in enumerate(y_pred):
    if item <= 0.7:
        y_pred[i] = 0
    else:
        y_pred[i] = 1

Or you can use this one-liner:

y_pred = [0 if item <= 0.7 else 1 for item in y_pred]

Or even easier:

y_pred = [int(item > 0.7) for item in y_pred]
Olvin Roght
  • 7,677
  • 2
  • 16
  • 35
1

The reason why what you are trying to do doesn't work is explained here. Basically you can't modify the element you are looping over.

What you can do to stick to for and if statements is the following:

for i in range(len(y_pred)):
    if y_pred[i] <= 0.7:
        y_pred[i] = 0
    else:
        y_pred[i] = 1

However I don't understand why you couldn't stick to slicing even in the multi-class case. But I guess this is for another question.

Zaccharie Ramzi
  • 2,106
  • 1
  • 18
  • 37