-1

I have two lists that I create outside my function. Inside this function, which is called multiple times, these two lists are extended. The problem is that after I finished my computations, the two lists are empty. Here's the code I'm using:

true_classes = []
predicted_classes = []

def report_cv(y_true, y_pred):
    true_classes.extend(y_true)
    predicted_classes.extend(y_pred)

    return accuracy_score(y_true, y_pred)

cv = StratifiedKFold(n_splits=5, shuffle=True)
rfr = RandomForestClassifier(n_estimators=1000, class_weight='balanced', 
                         n_jobs=-1)

scores = cross_val_score(rfr, 
                    X=data_ml_clean.iloc[:, 2:], 
                    y=data_ml_clean.vDili, 
                    cv=cv, n_jobs=-1, 
                    scoring=make_scorer(report_cv))

print(classification_report(true_classes, predicted_classes))

I do not understand why they are not treated like global variables. Adding global true_classes inside the function does not help.

wrong_path
  • 376
  • 1
  • 6
  • 18

2 Answers2

0

I had another look at this, have worked out what is probably wrong, but first your code with some additions:

true_classes = []
predicted_classes = []

def report_cv(y_true, y_pred):
    global true_classes
    global predicted_classes
    true_classes.extend(y_true)
    predicted_classes.extend(y_pred)

    return accuracy_score(y_true, y_pred)

cv = StratifiedKFold(n_splits=5, shuffle=True)
rfr = RandomForestClassifier(n_estimators=1000, class_weight='balanced', 
                         n_jobs=-1)

def calculate_scores():
    # (no global keyword needed here)
    scores = cross_val_score(rfr,
    X=data_ml_clean.iloc[:, 2:],
    y=data_ml_clean.vDili,
    cv=cv, n_jobs=-1,
    scoring=make_scorer(report_cv))  # this call to report_cv should set the two global variables
    print(true_classes)
    print(predicted_classes)

a = report_cv(actual_y_true, actual_y_pred)

print(classification_report(true_classes, predicted_classes))

So you only need the global keyword in the function which will set their values when it is first called as per this answer.

Then, and this is really what the problem is, as @Goyo mentioned, report_cv is not being called. From the scikit learn docs: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html

This factory function wraps scoring functions for use in GridSearchCV and cross_val_score. It takes a score function, such as accuracy_score, mean_squared_error, adjusted_rand_index or average_precision and returns a callable that scores an estimator’s output.

So returns a callable, wraps, do not imply calls.

I put in a line to call report_cv. You will need to give it values. This will make the global variables behave as you are expecting, list should no longer be empty, but I can't promise it will make the rest of this sklearn code behave as expected.

cardamom
  • 6,873
  • 11
  • 48
  • 102
  • Thanks but it does not work. The two lists are still empty! – wrong_path Jun 17 '19 at 13:02
  • @wrong_path have you fixed it yet? For the purposes of debugging this, can you add additional `print(1, true_classes)` and `print(1, predicted_classes)` to your first function report_cv before the return statement and see if the lists even there are empty or not. – cardamom Jun 18 '19 at 10:09
  • Not yet. I don't know what's wrong. If I print the list inside the function, they are not empty but they are overwritten each time. – wrong_path Jun 18 '19 at 10:11
  • And are they empty the very last time they are written? – cardamom Jun 18 '19 at 10:13
  • No, they contain the predicted and true values of that *iteration*. – wrong_path Jun 18 '19 at 10:13
0

report_cv function is not called actually, try to build a lambda function or what you can except that is you can call report_cv function first and then save its value and use it further