I have to use a dataset then use decision tree classifier, for that I can't have categorical data, but this dataset has columns with categorical data like this:
I know it can be done by using get_dummies function but I couldn't do it. I've firstly read the dataset like this:
def load_data(fname):
"""Load CSV file"""
df = pd.read_csv(fname)
nc = df.shape[1]
matrix = df.values
table_X = matrix [:, 2:]
table_y = matrix [:, 81]
features_names = df.columns.values[1:]
target = df.columns.values[81]
return table_X, table_y
table_X, table_y = load_data("dataset.csv")
pd.get_dummies(table_X)
when I run this I get this exception: Exception: Data must be 1-dimensional
What am I doing wrong?
------------------------------- EDIT ------------------------------------
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(table_y)
le.classes_
le.transform(['<200000', '>400000', '[200000,400000]'])
To apply the decision tree algorithm:
from sklearn import tree
dtc_Gini = tree.DecisionTreeClassifier() #criterion='gini'
dtc_Gini1 = dtc_Gini.fit(table_X, y)
ValueError: could not convert string to float: 'RL'