0

I want to use permutation_importance to calculate feature importance. From the doc, I understood that X_train needs to be array, and y_train needs to be array-like. However, I received AttributeError: 'numpy.ndarray' object has no attribute 'lower'

My code

vectoriser = TfidfVectorizer(ngram_range=(2,3), norm=None)
X_train = vectoriser.fit_transform(df['x'])
X_train = np.nan_to_num(X_train).toarray()
y_train = df['y']
y_train = y_train.values

# Undersampling
rus = RandomUnderSampler(random_state=0)
X_train, y_train = rus.fit_resample(X_train, y_train)

# Load the saved model
clf = joblib.load('model.joblib')

# Calculate result --> triggered error
result = permutation_importance(clf, X_train, y_train, n_repeats=10, random_state=42)

The data

X_train. shape = (1068, 3528)

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

y_train. shape = (1068,)

array(['A', 'A', 'A', ..., 'B', 'B', 'B'],
      dtype=object)

I input the correct data type why it gives me this error. Also what's does the error mean ? Thanks

Osca
  • 1,588
  • 2
  • 20
  • 41

0 Answers0