2

I am new to Python and am learning few things.

I have a dataset which is coded with strings. A list columns contains the names of all the columns in the list.

columns = ['median', 'p25th', 'p75th']

In this dataset, numbers are stored in the form of strings. Some of the columns do not carry numbers & are represented as UN like this:

['110000' '75000' '73000' '70000' '65000' 'UN' '62000']

['95000' '55000' '50000' '43000' 'UN' '31500' '48000']

['125000' '90000' '105000' '80000' '75000' '102000' 'UN' '109000']

I need to replace UN with NaN using np.nan. I used this code below:

for column in columns:
    recent_grads.loc[column =='UN', column] = np.nan

But I keep getting this error:

Traceback (most recent call last):

File "", line 15, in recent_grads.loc[column =='UN', column] = np.nan

File "", line 194, in setitem self._setitem_with_indexer(indexer, value) File "", line 332, in _setitem_with_indexer key, _ = convert_missing_indexer(idx)

File "", line 2049, in convert_missing_indexer raise KeyError("cannot use a single bool to index into setitem") KeyError: 'cannot use a single bool to index into setitem'

Can you please tell where I am going wrong? Sorry if this sounds too basic.

ALollz
  • 57,915
  • 7
  • 66
  • 89
Keerthana
  • 55
  • 7

1 Answers1

1

You could try to use Pandas DataFrame replace, like shown here

Data

d = [['median', 'p25th', 'p75th'],
     ['110000','75000','73000','70000','65000','UN','62000'],
     ['95000','55000','50000','43000','UN','31500','48000'],
     ['125000','90000','80000','75000','102000','UN','109000']
    ]
recent_grads = pd.DataFrame(zip(*d[1:]), columns=d[0])
print(recent_grads)

   median  p25th   p75th
0  110000  95000  125000
1   75000  55000   90000
2   73000  50000   80000
3   70000  43000   75000
4   65000     UN  102000
5      UN  31500      UN
6   62000  48000  109000

Code

import numpy as np
columns = ['median', 'p25th', 'p75th']
recent_grads[columns] = recent_grads[columns].replace('UN', np.nan)
print(recent_grads)

   median  p25th   p75th
0  110000  95000  125000
1   75000  55000   90000
2   73000  50000   80000
3   70000  43000   75000
4   65000    NaN  102000
5     NaN  31500     NaN
6   62000  48000  109000
edesz
  • 11,756
  • 22
  • 75
  • 123
  • This works! But the website am working with wants me to use for loop for column in ____: recent_grads[columns]=recent_grads[column=____,column] where we have to replace some code in these blank spaces. – Keerthana Nov 05 '18 at 16:43
  • 1
    In that case the [answer posted by @IanS](https://stackoverflow.com/questions/53157088/conditionally-replacing-missing-values-in-pandas/53157357?noredirect=1#comment93206107_53157088) is correct. This was also suggested by @ Vishnudev [here](https://stackoverflow.com/questions/53157088/conditionally-replacing-missing-values-in-pandas/53157357?noredirect=1#comment93204750_53157088). – edesz Nov 05 '18 at 17:12
  • Yes! now got it! I accidentally misplaced the square brackets. Sorry & Thank you for helping me out. – Keerthana Nov 05 '18 at 17:16