Sklearn cross_val_score with multi input KerasClassifier

Question

The goal is to perform cross validation on a Keras model with multiple inputs. This works fine with a normal sequential model with only one input. However, when using the functional api and extending to two inputs sklearns cross_val_score does not seem to work as expected.

def create_model():
    input_text = Input(shape=(1,), dtype=tf.string)
    embedding = Lambda(UniversalEmbedding, output_shape=(512, ))(input_text)
    dense = Dense(256, activation='relu')(embedding)

    input_title = Input(shape=(1,), dtype=tf.string)
    embedding_title = Lambda(UniversalEmbedding, output_shape=(512, ))(input_title)
    dense_title = Dense(256, activation='relu')(embedding_title)

    out = Concatenate()([dense, dense_title])

    pred = Dense(2, activation='softmax')(out)
    model = Model(inputs=[input_text, input_title], outputs=pred)
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    return model

part that fails

keras_classifier = KerasClassifier(build_fn=create_model, epochs=10, batch_size=10, verbose=1)
cv = StratifiedKFold(n_splits=10, random_state=0)
results = cross_val_score(keras_classifier, [X1, X2], y, cv=cv, scoring='f1_weighted')

error

Traceback (most recent call last):
  File "func.py", line 73, in <module>
    results = cross_val_score(keras_classifier, [X1, X2], y, cv=cv, scoring='f1_weighted')
  File "/home/timisb/.local/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 402, in cross_val_score
    error_score=error_score)
  File "/home/timisb/.local/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 225, in cross_validate
    X, y, groups = indexable(X, y, groups)
  File "/home/timisb/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 260, in indexable
    check_consistent_length(*result)
  File "/home/timisb/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 235, in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [2, 643]

Does anyone have an alternative approach to this, or suggestions of a solution? Thanks!

here the workaround to pass multiple inputs: https://stackoverflow.com/questions/56824968/grid-search-for-keras-with-multiple-inputs/62512554#62512554 — Marco Cerliani, Jul 01 '20 at 09:52

Henryk Borzymowski · Answer 1 · 2020-07-01T10:29:49.823

You could run your own cross validation implementation. Example CV implementation could look like this:

import numpy as np
from sklearn.model_selection import StratifiedKFold

input_1 = [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]]
input_2 = [[11], [12], [13], [14], [15], [16], [17], [18], [19], [20]]
Y = [[0], [0], [0], [2], [2], [0], [1], [1], [2], [0]]

# Split a dataset into k folds
def cross_validation_split(X1, X2, Y, folds=4):
    skf = StratifiedKFold(n_splits=4, shuffle = True)
    skf.get_n_splits(X1, Y)
    dataset_split = []
    i = 0
    for train_index, test_index in skf.split(X1, Y):
        print("TRAIN:", train_index, "TEST:", test_index)
        train_index = train_index.astype(int)
        test_index = test_index.astype(int)
        X1 = np.array(X1)
        X2 = np.array(X2)
        Y = np.array(Y)
        X_1_train, X_1_test = X1[train_index], X1[test_index]
        X_2_train, X_2_test = X2[train_index], X2[test_index]
        y_train, y_test = Y[train_index], Y[test_index]
        k_fold_set = {
                    'k_fold': i,
                    'train': {'X_1': X_1_train, 'X_2': X_2_train, 'Y': y_train},
                    'test': {'X_1': X_1_test, 'X_2': X_2_test, 'Y': y_test}
                    }
        dataset_split.append(k_fold_set)
        i = i + 1

    return dataset_split

result = cross_validation_split(input_1, input_2, Y, folds=4)

Then simply loop over the created result list and perform your training/validation logic and save the results into a list which will have the results for you k-fold cross validation.

score 2 · Accepted Answer · answered Feb 20 '19 at 11:06

2

I found the reason which is below.

You can use Sequential Keras models (single-input only) as part of your Scikit-Learn workflow via the wrappers found at keras.wrappers.scikit_learn.py.

https://keras.io/scikit-learn-api/

answered Feb 20 '19 at 11:06

Nori

2,340
1
18
41

here the workaround to pass multiple inputs: https://stackoverflow.com/questions/56824968/grid-search-for-keras-with-multiple-inputs/62512554#62512554 – Marco Cerliani Jul 01 '20 at 10:30

score 1 · Answer 3 · answered Dec 23 '18 at 19:43

1

You are using cross_val_score function from the scikit-learn indicating the ValueError: Found input variables with inconsistent numbers of samples: [2, 643]

It looks like sklearn requires the different data shape.

You could use data.reshape().

General tip: First, I think the cross validation is generally an indicator of "not having enough training data". Keras and generally TensorFlow team did not pay to much attention to provide CV features.

answered Dec 23 '18 at 19:43

prosti

42,291
14
186
151

Indeed I dont have a lot of data. I am using Transfer Learning. Where could I reshape the data, could you provide an example? – Isbister Dec 23 '18 at 20:22

Sklearn cross_val_score with multi input KerasClassifier

part that fails

error

3 Answers3

Linked