How to create dummy variables in Pandas (Python 2.7) has been asked many times, but I dont know a robust and fast solution yet. Consider this dataframe:
df=pd.DataFrame({'A':[1,2,-1,np.nan, 'rh']})
df
Out[9]:
A
0 1
1 2
2 -1
3 NaN
4 rh
yes, it has mixed types. Happens all the time with big datasets (I have millions of rows)
I need to create dummy variables that are 1 if a condition is true, and zero otherwise. I am assuming that if Pandas cannot perform the logical comparison (say comparing whether a string is larger than some real number), I would get a zero. Look at this instead:
df['dummy2']=(df.A > 0).astype(int)
df['dummy1']=np.where(df.A>0,1,0)
df
Out[12]:
A dummy2 dummy1
0 1 1 1
1 2 1 1
2 -1 0 0
3 NaN 0 0
4 rh 1 1
Clearly this is problematic. What is happening here? How can I prevent these false flags?
Many thanks!