Replace column values with 0 or 1 based on NaNs

Question

Here is a snapshot of the CSV data, file.

I want to replace the null, or 'nan', values with a 0 and replace all other entries with a 1 in the column 'Death Year':

import pandas as pd
import numpy as np
mydata_csv = pd.read_csv('D:\Python\character-deaths.csv',sep = ',',encoding = 'utf-8')
mydata_csv
del mydata_csv['Book of Death']
del mydata_csv['Death Chapter']

if mydata_csv['Death Year'] == np.nan:
 mydata_csv['Death Year'] = 0
else:
 mydata_csv['Death Year'] = 1

The above code produces the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Very common. What about the other answers that cover this doesn't apply here? https://stackoverflow.com/q/36921951/1531971 — , Sep 30 '17 at 13:45

score 3 · Accepted Answer · answered Sep 30 '17 at 13:23

You have two problems:

A logical operation on a series/dataframe does not yield a scalar result. It yields a vector, that if cannot understand.
NaN != NaN; your if condition will never hold true even if the columns are NaN.
```
In [9]: np.nan == np.nan
Out[9]: False
```

Just use np.where.

mydata_csv['Death Year'] = np.where(mydata_csv['Death Year'].isnull(), 0, 1)

Another improvement I'd recommend is using df.drop when deleting columns. Instead of del, try the more pandaic version:

mydata_csv = mydata_csv.drop(['Book of Death', 'Death Chapter'], 1)

score 0 · Answer 2 · answered Sep 30 '17 at 13:26

0

You didn't specify which line, but I suspect that your problem is in

if mydata_csv['Death Year'] == np.nan:

If so try checking if if the column have data first, something along the lines of

if mydata_csv['Death Year'] is not None and mydata_csv['Death Year'] == np.nan:

Hope that helps

answered Sep 30 '17 at 13:26

yatabani

320
2
10

This won't help. It is incorrect. – cs95 Sep 30 '17 at 13:31

jezrael · Answer 3 · 2017-09-30T13:44:16.543

I think better is use notnull for boolean mask and then cast it to int -> True is 1 and False is 0:

For working with missing data is necessary use special functions like isnull or notnull, check docs for more information.

#omit `sep=','` because default parameter
mydata_csv = pd.read_csv('D:\Python\character-deaths.csv', encoding = 'utf-8')
#simplify double del
mydata_csv = mydata_csv.drop(['Book of Death', 'Death Chapter'], axis=1)
mydata_csv['Death Year'] = mydata_csv['Death Year'].notnull().astype(int)

Sample:

mydata_csv = pd.DataFrame({'Book of Death':[4,5,4,5,5,4],
                           'Death Chapter':[7,8,9,4,2,3],
                           'Death Year':[np.nan,3,5,np.nan,1,0],
                           'col':[7,8,9,4,2,3]})

print (mydata_csv)   
   Book of Death  Death Chapter  Death Year  col
0              4              7         NaN    7
1              5              8         3.0    8
2              4              9         5.0    9
3              5              4         NaN    4
4              5              2         1.0    2
5              4              3         0.0    3

mydata_csv = mydata_csv.drop(['Book of Death', 'Death Chapter'], axis=1)
mydata_csv['Death Year'] = mydata_csv['Death Year'].notnull().astype(int)
print (mydata_csv)   
   Death Year  col
0           0    7
1           1    8
2           1    9
3           0    4
4           1    2
5           1    3

score 0 · Answer 4 · answered Oct 01 '17 at 07:43

0

See df.fillna() & df.replace()

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html

answered Oct 01 '17 at 07:43

Mo. Atairu

753
8
15

Replace column values with 0 or 1 based on NaNs

4 Answers4