1

I have a column called 'Country' and there are quite a lot '?' as values I tried to convert them to NAN but the values are not changing. This is my code.

df['Country'].value_counts()

United-States      29170

Mexico             643

?                  583

Philippines        198

Germany            137


df[df['Country']=='?'] = np.nan

df['Country'].isnull().sum()

0

And i also tried using replace function.

df['Country'].replace('?', np.nan)

And I also tried

df = pd.read_csv('train.csv', na_values=['?'])

And even if I try to print all the rows where country values are '?'

it gives empty dataframe. I don't know how to solve this. Can someone please help me.

Thanks

Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
sarah444
  • 69
  • 1
  • 9
  • Agree with not using `inplace=True`. Apparently the operations often are not actually `inplace` and can lead to unexpected behavior. See https://stackoverflow.com/a/60020384/5666087 for example – jkr Sep 28 '20 at 04:16

2 Answers2

1

Looks like there are whitespaces along with your ?. You need to strip those and then apply replace command with inplace=True, like below:

In [848]: df
Out[848]: 
         Country  values
0  United-States   29170
1         Mexico     643
2              ?     583
3    Philippines     198
4        Germany     137

In [849]: df['Country'].str.strip().replace('?', np.nan, inplace=True)

In [850]: df
Out[850]: 
         Country  values
0  United-States   29170
1         Mexico     643
2            NaN     583
3    Philippines     198
4        Germany     137
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
  • I have tried your suggestion but it's still not working. Thank you for the help though. – sarah444 Sep 28 '20 at 03:42
  • @sarah444 Looks like there are some whitespaces along with your `?`. Please check my updated answer. – Mayank Porwal Sep 28 '20 at 04:12
  • It's still not working. Maybe there is something wrong in file as i have tried several other thing and none of them are working. Is it okay if i share my csv file so that you can take a look at it. Thank you – sarah444 Sep 28 '20 at 12:19
  • Please don't run all the commands. Just run my command on your data frame. It should work. Please share your csv as well. – Mayank Porwal Sep 28 '20 at 12:29
  • 'http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data' I am only running your code. – sarah444 Sep 28 '20 at 12:51
0
df['Country'].replace(['?'], np.nan,inplace = True)

You forget to do replace with inplace True, Hence it was not reflecting

Ujjwal Agrawal
  • 830
  • 1
  • 8
  • 14