0

I've imported data. The empty fields show as nan. Datatypes for columns are a mix of float, string, object etc. I wanted to replace 'na' with 'N/A' making the replacement case insensitive. I used the following code from python: better way to handle case sensitivities with df.replace to do this:

# replace NA w N/A
dfMSR = dfMSR.apply(lambda x: x.astype(str).str.replace(r'\bna\b', 'N/A', regex=True,case=False))

When I run the above code, dtype for all columns changes to "object". This creates many problems, including the one below:

a = dfMSR.copy()
a = a[['AppBaselineType', 'RvwBaselineTypeAction', 'RvwBaselineType']]
a['AppBaselineType'] = np.where(((a['RvwBaselineTypeAction'].isnull()) | 
                                 (a['RvwBaselineTypeAction'] == '-') | 
                                 (a['RvwBaselineTypeAction'] == 'N/A')), 
                                a['RvwBaselineType'], a['RvwBaselineTypeAction'])

The nans are not replaced by values in RvwBaselineType because they've changed to the actual text 'nan'.

a.describe() #provides the result:

       AppBaselineType RvwBaselineTypeAction RvwBaselineType
count              292                   292             292
unique               4                     4               4
top                nan                   nan        Existing
freq               251                   251             154

print(dfMSR['RvwBaselineTypeAction'].isnull().sum()) #provides the result:

0

#replace isnull() with == nan gives the desired output
a['AppBaselineType'] = np.where(((a['RvwBaselineTypeAction'] == 'nan') | 
                                 (a['RvwBaselineTypeAction'] == '-') | 
                                 (a['RvwBaselineTypeAction'] == 'N/A')), 
                                a['RvwBaselineType'], a['RvwBaselineTypeAction'])

I'd like to ideally run the replace without changing (losing original) datatypes. Any suggestions?

#raw data:
RvwBaselineType RvwBaselineTypeAction   AppBaselineType
Existing        nan                     nan
Existing        -                       nan
nan             nan                     nan
Existing        N/A                     nan
Existing        ABC                     nan 

#desired output
RvwBaselineType RvwBaselineTypeAction   AppBaselineType
Existing        nan                     Existing        
Existing        -                       Existing        
nan             nan                     nan
ESDffogr        N/A                     ESDffogr        
Existing        ABC                     ABC


Can share a sample file if someone can tell me how to do this on SO. 
Thanks
SModi
  • 125
  • 14

0 Answers0