I'm really struggling with encoding categorical types. Given two DataFrames X_train
and X_test
I'm trying to encode all the zip codes. The greatest hindrance for me is to be able to encode values from both dataframes (as they vary to some extention) in the same fashion, so I thought of making a list of all possible zip code values and then use it to encode both series (as parts of DataFrames). Unfortunately, this doesn't work as an error AttributeError: 'numpy.ndarray' object has no attribute 'transform'
appears. I'm running out of ideas.
X_train = X[['ticket_id','judgment_amount','zip_code']]
X_test = y[['ticket_id','judgment_amount','zip_code']]
Xtrain_zipcode = X_train['zip_code'].dropna().unique().tolist()
Xtest_zipcode = X_test['zip_code'].dropna().unique().tolist()
zip_list = Xtrain_zip
for elem in Xtest_zipcode:
if elem not in Xtrain_zipcode:
zip_list.append(elem)
enc_zipc = LabelEncoder().fit(zip_list)
encoded = enc_zipc.transform(zip_list)
encoded.transform(X_train['zip_code'])
I have also read that LabelEncoder
is not advisable while dealing with categorical features that are an input. What would you suggest? One-hot encoding?