Keras Wrappers for Scikit Learn - AUC scorer is not working

Question

I'm trying to use Keras Scikit Learn Wrapper in order to make random search for parameters easier. I wrote an example code here where :

I generate an artificial dataset:

I am using moons from scikit learn

from sklearn.datasets import make_moons
dataset = make_moons(1000)

Model builder definition:

I define build_fn function needed:

def build_fn(nr_of_layers = 2,
             first_layer_size = 10,
             layers_slope_coeff = 0.8,
             dropout = 0.5,
             activation = "relu",
             weight_l2 = 0.01,
             act_l2 = 0.01,
             input_dim = 2):

    result_model = Sequential()
    result_model.add(Dense(first_layer_size,
                           input_dim = input_dim,
                           activation=activation,
                           W_regularizer= l2(weight_l2),
                           activity_regularizer=activity_l2(act_l2)
                           ))

    current_layer_size = int(first_layer_size * layers_slope_coeff) + 1

    for index_of_layer in range(nr_of_layers - 1):

        result_model.add(BatchNormalization())
        result_model.add(Dropout(dropout))
        result_model.add(Dense(current_layer_size,
                               W_regularizer= l2(weight_l2),
                               activation=activation,
                               activity_regularizer=activity_l2(act_l2)
                               ))

        current_layer_size = int(current_layer_size * layers_slope_coeff) + 1

    result_model.add(Dense(1,
                           activation = "sigmoid",
                           W_regularizer = l2(weight_l2)))

    result_model.compile(optimizer="rmsprop", metrics = ["accuracy"], loss = "binary_crossentropy")

    return result_model

NeuralNet = KerasClassifier(build_fn)

Parameter grid definition :

Then I defined a parameter grid :

param_grid = {
    "nr_of_layers" : [2, 3, 4, 5],
    "first_layer_size" : [5, 10, 15],
    "layers_slope_coeff" : [0.4, 0.6, 0.8],
    "dropout" : [0.3, 0.5, 0.8],
    "weight_l2" : [0.01, 0.001, 0.0001],
    "verbose" : [0],
    "batch_size" : [1],
    "nb_epoch" : [30]
}

RandomizedSearchCV phase:

I defined RandomizedSearchCV object and fitted with values from artificial dataset:

random_search = RandomizedSearchCV(NeuralNet, 
    param_distributions=param_grid, verbose=2, n_iter=1, scoring="roc_auc")
random_search.fit(dataset[0], dataset[1])

What I got (after running this code in console) is :

Traceback (most recent call last):
  File "C:\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py", line 2885, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-c5bdbc2770b7>", line 2, in <module>
    random_search.fit(dataset[0], dataset[1])
  File "C:\Anaconda2\lib\site-packages\sklearn\grid_search.py", line 996, in fit
    return self._fit(X, y, sampled_params)
  File "C:\Anaconda2\lib\site-packages\sklearn\grid_search.py", line 553, in _fit
    for parameters in parameter_iterable
  File "C:\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 800, in __call__
    while self.dispatch_one_batch(iterator):
  File "C:\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 658, in dispatch_one_batch
    self._dispatch(tasks)
  File "C:\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 566, in _dispatch
    job = ImmediateComputeBatch(batch)
  File "C:\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 180, in __init__
    self.results = batch()
  File "C:\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 72, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "C:\Anaconda2\lib\site-packages\sklearn\cross_validation.py", line 1550, in _fit_and_score
    test_score = _score(estimator, X_test, y_test, scorer)
  File "C:\Anaconda2\lib\site-packages\sklearn\cross_validation.py", line 1606, in _score
    score = scorer(estimator, X_test, y_test)
  File "C:\Anaconda2\lib\site-packages\sklearn\metrics\scorer.py", line 175, in __call__
    y_pred = y_pred[:, 1]
IndexError: index 1 is out of bounds for axis 1 with size 1

This code work fine when instead of using scoring = "roc_auc" I used accuracy metric. Can anyone explain me what's wrong? Have anyone had similiar problem?

Dalupus · Accepted Answer · 2016-06-03T00:39:56.770

4

There is a bug in the KerasClassifier that is causing this issue. I have opened an issue for it on the repo. https://github.com/fchollet/keras/issues/2864

The fix is also in there. You can define your own KerasClassifier in the mean time as a temporary workaround.

class FixedKerasClassifier(KerasClassifier):
    def predict_proba(self, X, **kwargs):
        kwargs = self.filter_sk_params(Sequential.predict_proba, kwargs)
        probs = self.model.predict_proba(X, **kwargs)
        if(probs.shape[1] == 1):
            probs = np.hstack([1-probs,probs]) 
        return probs

edited Jun 03 '16 at 00:39

answered Jun 01 '16 at 13:15

Dalupus

1,100
9
19

I'd like to thank you very much - your solution did a great job. Cheers :) – Marcin Możejko Jun 02 '16 at 22:30
Excellent solution! – MTT Sep 15 '17 at 18:53

Keras Wrappers for Scikit Learn - AUC scorer is not working

1 Answers1

Linked