Python error: could not convert string to float: 'male'

Question

I see that similar questions have been asked, but it doesn't look like those were caused by the same problem. Here is my code that gives the error:

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neighbors import KNeighborsRegressor
from io import StringIO
d = pd.read_csv("http://www.stat.wisc.edu/~jgillett/451/data/kaggle_titanic_train.csv")
data  =d[['Survived','Pclass','Sex','Age','SibSp','Parch']]
#print(data.head(n=7))
y = data.Survived
X = data[['Pclass','Sex','Age','SibSp','Parch']]
k = 3
knn = KNeighborsClassifier(n_neighbors=k, weights='distance', metric='euclidean')
knn.fit(X, y)

So I tried to convert it to float like this:

data.Sex=data[['Sex']].astype(float)

But that just gives the exact same error. Why is it not able to convert the string to float?

For starters, the string type can NOT be converted to float value. Additionally, when sharing errors on SO, it's best to pass the actual error produced by the python interpreter. — MedoAlmasry, Mar 31 '23 at 02:53
you can change that categorical data datatyped string to numerical by one hot encoding or label encoding — biyazelnut, Mar 31 '23 at 02:56

Corralien · Answer 1 · 2023-03-31T03:05:27.393

You can use replace or pd.factorize:

data['Sex'] = data['Sex'].replace({'male': 0, 'female': 1})

# OR

data['Sex'] = pd.factorize(data['Sex'])[0]

Output:

>>> data
     Survived  Pclass  Sex   Age  SibSp  Parch
0           0       3    0  22.0      1      0
1           1       1    1  38.0      1      0
2           1       3    1  26.0      0      0
3           1       1    1  35.0      1      0
4           0       3    0  35.0      0      0
..        ...     ...  ...   ...    ...    ...
886         0       2    0  27.0      0      0
887         1       1    1  19.0      0      0
888         0       3    1   NaN      1      2
889         1       1    0  26.0      0      0
890         0       3    0  32.0      0      0

[891 rows x 6 columns]

Important note

To prevent SettingWithCopyWarning, use:

url = 'http://www.stat.wisc.edu/~jgillett/451/data/kaggle_titanic_train.csv'
cols = ['Survived', 'Pclass', 'Sex', 'Age', 'SibSp', 'Parch']
data = pd.read_csv(url, usecols=cols)

# OR

data = d[['Survived', 'Pclass', 'Sex', 'Age', 'SibSp', 'Parch']].copy()

Python error: could not convert string to float: 'male'

1 Answers1