I want to use permutation_importance to calculate feature importance. From the doc, I understood that X_train needs to be array, and y_train needs to be array-like. However, I received AttributeError: 'numpy.ndarray' object has no attribute 'lower'
My code
vectoriser = TfidfVectorizer(ngram_range=(2,3), norm=None)
X_train = vectoriser.fit_transform(df['x'])
X_train = np.nan_to_num(X_train).toarray()
y_train = df['y']
y_train = y_train.values
# Undersampling
rus = RandomUnderSampler(random_state=0)
X_train, y_train = rus.fit_resample(X_train, y_train)
# Load the saved model
clf = joblib.load('model.joblib')
# Calculate result --> triggered error
result = permutation_importance(clf, X_train, y_train, n_repeats=10, random_state=42)
The data
X_train. shape = (1068, 3528)
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]])
y_train. shape = (1068,)
array(['A', 'A', 'A', ..., 'B', 'B', 'B'],
dtype=object)
I input the correct data type why it gives me this error. Also what's does the error mean ? Thanks