First of all, I would like to apologize if this is a no-brainer, but I am still relatively new at this.
I have a dataset that has around 2,000+ rows and few columns. The last column, is the label that I want to predict.
For example, the dataset kinda looks like this :
Habitat Diet Class Family Weight(kg) Label
Land Herbivore Mammals Bovidae 200.00 Cattle
Sea Carnivore Mammals Balaenopteridae 2100.00 Baleen Whale
Sea Herbivore Mammals Trichechidae 540.00 Menatee
The column label (which consists of the label) has been one hot encoded using Scikit-Learn and combined back to the original dataframe. Which looks like this.
Habitat Diet Class Family Weight(kg) Label_0 Label_1 Label_2
Land Herbivore Mammals Bovidae 200.00 1.0 0.0 0.0
Sea Carnivore Mammals Balaenopteridae 2100.00 0.0 1.0 0.0
Sea Herbivore Mammals Trichechidae 540.00 0.0 0.0 1.0
The column Label is dropped after that. But after that, I literally do not know how to proceed because this is my first time doing a practical approach on this subject.
I also have split them to train and test set (based on a internet tutorial that I followed).
But right now, when I try below, it doesn't work.
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier()
classifier.fit(X_train, y_train)
It gives an error message below
ValueError: could not convert string to float: 'Value from one of the column'
How do I approach this problem and proceed properly to classification?