I'm writing a class that does one hot encoding, but it doesn't work as I expected.
On my main code I have this:
for col in train_x_categorical.columns:
dataCleaner.addFeatureToBeOneHotEncoded(col)
dataCleaner.applyOneHotEncoding(train_x_categorical)
train_x_categorical.head()
The class method is the following:
def addFeatureToBeOneHotEncoded(self, featureName):
self._featuresToBeOneHotEncoded.append(featureName)
def applyOneHotEncoding(self, data):
for feature in self._featuresToBeOneHotEncoded:
dummies = pd.get_dummies(data[feature])
dummies.drop(dummies.columns[-1],axis=1,inplace=True)
data.drop(feature, axis=1, inplace=True)
data = pd.concat([data, dummies], axis=1)
print(data.columns)
Now, with print(data.columns)
I can see that the method works correctly, but when train_x_categorical.head()
runs I can't see the effect of the method applyOneHotEncoding
.
I don't understand why this is happening and how to fix it.
I thought that since python passes values by reference, the variable data
points to the same object as the variable train_x_categorical
, so in the method applyOneHotEncoding
I was working on the same object, but clearly I am wrong.
Can someone explain to me why my reasoning is wrong and how I can solve the problem?