3

I am trying to apply feature selection on categorical variables with SelectKBest and chi2 (SelectKBest(chi2, k=5)). But getting value error that string could not converted to float. I know the workaround is to transform categorical variables to dummies using pd.get_dummies(). But why is it so? chi square statistical test is meant for bivariate analysis of two categorical variables, then why it is not accepting the categorical variables?

kevins_1
  • 1,268
  • 2
  • 9
  • 27
GreenHeart
  • 73
  • 1
  • 6

1 Answers1

0

In this context chi square measures correspondence between feature values (in X) and classes (in y). To do that it needs class frequencies from the target variable and sums of feature values for each class. Then it compares this "ideal" distribution of values per class (i.e. total sum of X values multiplied with class frequencies) with real distribution (i.e. sum of actual values for each class) to get your chi square value. See here for details.

hellpanderr
  • 5,581
  • 3
  • 33
  • 43