How do I apply dataset that has been One Hot Encoded by scikit-learn to do Decision Trees?

Question

First of all, I would like to apologize if this is a no-brainer, but I am still relatively new at this.

I have a dataset that has around 2,000+ rows and few columns. The last column, is the label that I want to predict.

For example, the dataset kinda looks like this :

Habitat   Diet        Class     Family            Weight(kg)    Label
Land      Herbivore   Mammals   Bovidae           200.00        Cattle
Sea       Carnivore   Mammals   Balaenopteridae   2100.00       Baleen Whale
Sea       Herbivore   Mammals   Trichechidae      540.00        Menatee

The column label (which consists of the label) has been one hot encoded using Scikit-Learn and combined back to the original dataframe. Which looks like this.

Habitat   Diet        Class     Family            Weight(kg)    Label_0    Label_1  Label_2
Land      Herbivore   Mammals   Bovidae           200.00        1.0        0.0      0.0
Sea       Carnivore   Mammals   Balaenopteridae   2100.00       0.0        1.0      0.0
Sea       Herbivore   Mammals   Trichechidae      540.00        0.0        0.0      1.0

The column Label is dropped after that. But after that, I literally do not know how to proceed because this is my first time doing a practical approach on this subject.

I also have split them to train and test set (based on a internet tutorial that I followed).

But right now, when I try below, it doesn't work.

from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier()
classifier.fit(X_train, y_train)

It gives an error message below

ValueError: could not convert string to float: 'Value from one of the column'

How do I approach this problem and proceed properly to classification?

Does this answer your question? [How can I one hot encode in Python?](https://stackoverflow.com/questions/37292872/how-can-i-one-hot-encode-in-python) — Roshin Raphel, Jul 14 '20 at 10:24
@RoshinRaphel, apparently not because I already do the one hot encoding for the label column (that I targeted). It's just how to proceed after that, meaning, how do I take all those new generated columns that are from OneHotEncoder and apply it in sklearn DecisionTreesClassifier(). Because usually the tutorial that I watched or read (like this one : https://www.tutorialspoint.com/machine_learning_with_python/machine_learning_with_python_classification_algorithms_decision_tree.htm), the column that are targeted for predictions are only 1 column (usually column name label). — Lutfi, Jul 14 '20 at 15:59

How do I apply dataset that has been One Hot Encoded by scikit-learn to do Decision Trees?

0 Answers0