1

I'm using following dataset from Kaggle https://www.kaggle.com/harlfoxem/housesalesprediction and at the point when i'm trying to convert the zipcode to categorical dummy variables i get a weird output (0-dimensional array) when i apply the OneHotEncoder. Im using following code to fit transform the dataset:

ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [11])], remainder='passthrough') 
X = np.array(ct.fit_transform(X))

In the above code, X is of shape numpy.ndarray and has the shape (21613, 16) The column which i want to convert is 'zip code' and it is of Dtype int64 and is located at 11th index (eg. for the first row -> X[0][11]).

When i run the above ColumnTransformer i dont get any error. But when i explore X after the Fit Transform i get a 0-dimensional array:

Output of X after fit Transform

The output does not seem to me normal so i tried different modifications but could not solve it. I finally got it working by using Pandas get dummies method but i still want to understand why i couldn't perform it with OneHotEncoder.

I'm following the Machine Learning A-Z from Udemy and i notice that the only step which the instructor performed before the step of encoding categorical Data was that of Missing data and simpleImputer. But since i dont have any missing data i didn't performed this step. Not sure whether this is related to my issue.

Frederik
  • 11
  • 3

0 Answers0