I've imported data. The empty fields show as nan. Datatypes for columns are a mix of float, string, object etc. I wanted to replace 'na' with 'N/A' making the replacement case insensitive. I used the following code from python: better way to handle case sensitivities with df.replace to do this:
# replace NA w N/A
dfMSR = dfMSR.apply(lambda x: x.astype(str).str.replace(r'\bna\b', 'N/A', regex=True,case=False))
When I run the above code, dtype for all columns changes to "object". This creates many problems, including the one below:
a = dfMSR.copy()
a = a[['AppBaselineType', 'RvwBaselineTypeAction', 'RvwBaselineType']]
a['AppBaselineType'] = np.where(((a['RvwBaselineTypeAction'].isnull()) |
(a['RvwBaselineTypeAction'] == '-') |
(a['RvwBaselineTypeAction'] == 'N/A')),
a['RvwBaselineType'], a['RvwBaselineTypeAction'])
The nans are not replaced by values in RvwBaselineType because they've changed to the actual text 'nan'.
a.describe() #provides the result:
AppBaselineType RvwBaselineTypeAction RvwBaselineType
count 292 292 292
unique 4 4 4
top nan nan Existing
freq 251 251 154
print(dfMSR['RvwBaselineTypeAction'].isnull().sum()) #provides the result:
0
#replace isnull() with == nan gives the desired output
a['AppBaselineType'] = np.where(((a['RvwBaselineTypeAction'] == 'nan') |
(a['RvwBaselineTypeAction'] == '-') |
(a['RvwBaselineTypeAction'] == 'N/A')),
a['RvwBaselineType'], a['RvwBaselineTypeAction'])
I'd like to ideally run the replace without changing (losing original) datatypes. Any suggestions?
#raw data:
RvwBaselineType RvwBaselineTypeAction AppBaselineType
Existing nan nan
Existing - nan
nan nan nan
Existing N/A nan
Existing ABC nan
#desired output
RvwBaselineType RvwBaselineTypeAction AppBaselineType
Existing nan Existing
Existing - Existing
nan nan nan
ESDffogr N/A ESDffogr
Existing ABC ABC
Can share a sample file if someone can tell me how to do this on SO.
Thanks