0

enter image description here

I have columns of strings and I have to convert it into values. I used this code and unfortunately the fillna method don't work at this example.

How can I fix the problem?

Here's the head()

Head()

data['country_txt'] = data['country_txt'].astype('float64') 
data['city'] = data['city'].astype('float64') 

I expected a normal result but the actual output is all fulled with NaN values:

country_txt 0 non-null float64 city 0 non-null float64

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Amamra
  • 89
  • 1
  • 9
  • At the beginning I had these informations: country_txt 170350 non-null object city 169904 non-null object – Amamra Apr 05 '19 at 19:31
  • We need more information... post `data.head()` – rafaelc Apr 05 '19 at 19:32
  • I edited the post you can find it in Head() – Amamra Apr 05 '19 at 19:41
  • `country_txt` is a string, for example, `Mexico`. What do you expect "Mexico" `.astype(int)` to become? – rafaelc Apr 05 '19 at 19:43
  • I expect a randomly given numeric value to each given country. Do you have any proposed solution please? – Amamra Apr 05 '19 at 19:49
  • Related, possible dupe: [Label encoding across multiple columns in scikit-learn](https://stackoverflow.com/questions/24458645/label-encoding-across-multiple-columns-in-scikit-learn) – cs95 Apr 05 '19 at 19:49

1 Answers1

0

Apparently, you need to map your strings to integer representations.

There are many different ways to do that.

1 pd.factorize

df['country_as_int'] = pd.factorize(df['country_txt'])[0]

2 LabelEncoder

from sklearn.preprocessing import LabelEncoder
f = LabelEncoder()
df['country_as_int'] = f.fit_transform(df['country_txt'])

3 np.unique

df['country_as_int'] = np.unique(df['country_txt'], return_inverse=True)[-1]
rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • Thank you a lot for answering me, but using this, I had an error: TypeError: '<' not supported between instances of 'float' and 'str' – Amamra Apr 05 '19 at 20:03
  • @Amamra then use option #1. As an alternative, you can also reinforce `df['country_txt'].astype(str)` as to make `NaN`s a string. – rafaelc Apr 05 '19 at 20:05