I am trying to detect anomalies in a breast cancer dataset using Isolation Forest in sklearn. I am trying to apply Iolation Forest to a mixed data set and it gives me value errors when I fit the model.
This is my dataset : https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer/
This is my code :
from sklearn.model_selection import train_test_split
rng = np.random.RandomState(42)
X = data_cancer.drop(['Class'],axis=1)
y = data_cancer['Class']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 20)
X_outliers = rng.uniform(low=-4, high=4, size=(X.shape[0], X.shape[1]))
clf = IsolationForest()
clf.fit(X_train)
This is the error I get :
ValueError: could not convert string to float: '30-39'
Is it possible to use Isolation Forest on categorical data? If yes, how do I do so?