How to convert string columns to numeric values without getting NaN values

Question

I have columns of strings and I have to convert it into values. I used this code and unfortunately the fillna method don't work at this example.

How can I fix the problem?

Here's the head()

data['country_txt'] = data['country_txt'].astype('float64') 
data['city'] = data['city'].astype('float64')

I expected a normal result but the actual output is all fulled with NaN values:

country_txt 0 non-null float64 city 0 non-null float64

At the beginning I had these informations: country_txt 170350 non-null object city 169904 non-null object — Amamra, Apr 05 '19 at 19:31
`country_txt` is a string, for example, `Mexico`. What do you expect "Mexico" `.astype(int)` to become? — rafaelc, Apr 05 '19 at 19:43
I expect a randomly given numeric value to each given country. Do you have any proposed solution please? — Amamra, Apr 05 '19 at 19:49
Related, possible dupe: [Label encoding across multiple columns in scikit-learn](https://stackoverflow.com/questions/24458645/label-encoding-across-multiple-columns-in-scikit-learn) — cs95, Apr 05 '19 at 19:49

score 0 · Accepted Answer · answered Apr 05 '19 at 19:46

0

Apparently, you need to map your strings to integer representations.

There are many different ways to do that.

df['country_as_int'] = pd.factorize(df['country_txt'])[0]

from sklearn.preprocessing import LabelEncoder
f = LabelEncoder()
df['country_as_int'] = f.fit_transform(df['country_txt'])

df['country_as_int'] = np.unique(df['country_txt'], return_inverse=True)[-1]

answered Apr 05 '19 at 19:46

rafaelc

Thank you a lot for answering me, but using this, I had an error: TypeError: '<' not supported between instances of 'float' and 'str' – Amamra Apr 05 '19 at 20:03
@Amamra then use option #1. As an alternative, you can also reinforce `df['country_txt'].astype(str)` as to make `NaN`s a string. – rafaelc Apr 05 '19 at 20:05

1 Answers1