I need to separate some data that I got. I'm using pandas DataFrame in order to do this.
Here is the code before my problem:
import pandas as pd
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.svm import LinearSVC
from sklearn.metrics import ConfusionMatrixDisplay
arquivo_arff = arff.loadarff(r"/content/Rice_MSC_Dataset.arff")
dados = pd.DataFrame(arquivo_arff[0])
dados = dados[['MINOR_AXIS', 'MAJOR_AXIS', 'CLASS']]
I've already done a scatterplot graph with 5 parameters to analyze with this code (0 filters):
sns.scatterplot(
data=dados,
x="MINOR_AXIS",
y="MAJOR_AXIS",
hue="CLASS")
plt.show()
My problem: I need to filter only the species b'Basmati' and b'Ipsala', but i'm unnable to do that, I don't know why.
The "CLASS" parameters are: b'Basmati',b'Arborio',b'Jasmine',b'Ipsala',bKaracadag'
But, in the ".arff" file that I used, the names are only "Basmati,Arborio,Jasmine,Ipsala,Karacadag"
What I've tried: Filter only this two species, with this code:
dados = dados[dados['CLASS'].isin(["" "b'Arborio'" "", "" "b'Ipsala'" ""])]
Didn't work. How can I fix this?