Im trying to make prediction with my own output. Im using Python Scikit-learn lib and Isolation Forest as algorithm. I do not know what am I doing wrong, but when I want to transform my input data I always get an error. I get an error in this line:
input_par = encoder.transform(val)#ERROR
this is the error:
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
And I have tried this, but I always get an error:
input_par = encoder.transform([val])#ERROR
this is the error: alueError: Specifying the columns using strings is only supported for pandas DataFrames
What am I doing wrong, how can I fix this error?
Also, should I use OneHotEncoder
, LabelEncoder
or CountVectorizer
?
This is my code:
import pandas as pd
from sklearn.ensemble import IsolationForest
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
textual_data = ['i love you', 'I love your dress', 'i like that', 'thats good', 'amazing', 'wrong', 'hi, how are you, are you doing good']
num_data = [4, 1, 3, 2, 65, 3,3]
df = pd.DataFrame({'my text': textual_data,
'num data': num_data})
x = df
# Transform the features
encoder = ColumnTransformer(transformers=[('onehot', OneHotEncoder(), ['my text'])], remainder='passthrough')
#encoder = ColumnTransformer(transformers=[('lab', LabelEncoder(), ['my text'])])
x = encoder.fit_transform(x)
isolation_forest = IsolationForest(contamination = 'auto', behaviour = 'new')
model = isolation_forest.fit(x)
list_of_val = [['good work',2], ['you are wrong',54], ['this was amazing',1]]
for val in list_of_val:
input_par = encoder.transform(val)#ERROR
outlier = model.predict(input_par)
#print(outlier)
if outlier[0] == -1:
print('Values', val, 'are outliers')
else:
print('Values', val, 'are not outliers')
EDIT:
I have also tried this:
list_of_val = [['good work',2], ['you are wrong',54], ['this was amazing',1]]
for val in list_of_val:
input_par = encoder.transform(pd.DataFrame({'my text': val[0],
'num data': val[1]}))
But I get this error:
ValueError: If using all scalar values, you must pass an index