0

I have tried the following code and this error has been occuring to me

Link for DataSet is in link bellow

ValueError ---> line 18 ds1_model.fit(X, y)

ValueError: could not convert string to float: 'Iris-setosa'

  import pandas as pd
from sklearn.metrics import mean_absolute_error
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv'
ds1 = pd.read_csv(url)
ds1.columns = (['SepalLength' , 'SepalWidth' , 'PetalLength' , 'PetalWidth' , 'ClassLabel'])
ds1_filtered=ds1.dropna(axis=0)

y = ds1_filtered.ClassLabel

ds1_features = ['SepalLength' , 'SepalWidth' , 'PetalLength' , 'PetalWidth']
X = ds1_filtered[ds1_features]

ds1_model = DecisionTreeRegressor()

ds1_model.fit(X, y)

PredictedClassLabel = ds1_model.predict(X)
mean_absolute_error(y, PredictedClassLabel)

train_X, val_X, train_y, val_y = train_test_split(X, y, random_state = 0)
ds1_model = DecisionTreeRegressor()
ds1_model.fit(train_X, train_y)

predicitions = ds1_model.predict(val_X)
print(mean_absolute_error(val_y, predictions))

can you please help to suggest or explain how to fix this?

DataSet Link

Community
  • 1
  • 1
Sam
  • 23
  • 3

1 Answers1

2

As the name ClassLabel implies, the iris dataset is a classification and not a regression one; hence, neither DecisionTreeRegressor is the correct model to use nor mean_absolute_error is the correct metric.

You should use a DecisionTreeClassifier and accuracy_score instead:

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

iris = load_iris()
clf = DecisionTreeClassifier()

train_X, val_X, train_y, val_y = train_test_split(iris.data, iris.label, random_state = 0)
clf.fit(train_X, train_Y)

pred = clf.predict(val_X)
print(accuracy_score(val_y, pred))

The scikit-learn decision tree classification tutorial using the said dataset can give you more ideas.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • 1
    100% correct, but it might also be worth mentioning that the specific error is due to the fact that sklearn models (classification or regression both) can't take string input, the the targets need to be encoded in some way to numeric types – G. Anderson Nov 06 '19 at 20:36
  • What do you suggest to use? And is it possible if I turn the ClassLabels into numbers to represent eachone of the strings and then apply regression? – Sam Nov 06 '19 at 20:36
  • I saw .. thank you so much just for clarification accuracy_score is equivalent to the mean_absolute_error – Sam Nov 06 '19 at 20:52
  • @desertnaut Can you please take a look into this question? https://stackoverflow.com/q/58900947/5904928 I am struggling for hours, couldn't find an answer. – Aaditya Ura Nov 17 '19 at 13:50