Need to remove nan from a column when its concatenated with strings using pandas

Question

My data looks like this:

First_Name Middle_Name Last_Name Full_Name (Header)
John       R           Rovin     John R Rovin
Marano                 Justine   Marano nan Justine
David                  Rose      David nan  Rose
                                 nan nan nan
Robert                           Robert nan nan

I am trying to trim nan from Full_Name column to just get whatever name it possibly contains as is. When I am trying to use Fillna(' ') , its not helping me to trim but its completely removing the column content. My final DF should look something like this:

First_Name Middle_Name Last_Name Full_Name (Header)

John       R           Rovin     John R Rovin
Marano                 Justine   Marano Justine
David                  Rose      David Rose
                                 
Robert                           Robert

I am heavily dependent for most of the operations on Pandas. So is there any way using pandas I can solve this problem?

check this post for removing nan substrings: https://stackoverflow.com/questions/13682044/remove-unwanted-parts-from-strings-in-a-column or this post for joining the columns without having nan in the first place: https://stackoverflow.com/questions/33158417/pandas-combine-two-strings-ignore-nan-values — amy989, Mar 18 '22 at 09:54

score 1 · Answer 1 · answered Mar 18 '22 at 09:57

You can use:

cols = ['First_Name', 'Middle_Name', 'Last_Name']
df['Full_Name'] = df[cols].apply(lambda x: ' '.join(i for i in x if pd.notna(i)), axis=1)
print(df)

# Output
  First_Name Middle_Name Last_Name       Full_Name
0       John           R     Rovin    John R Rovin
1     Marano         NaN   Justine  Marano Justine
2      David         NaN      Rose      David Rose
3        NaN         NaN       NaN                
4     Robert         NaN       NaN          Robert

Setup a MRE

import pandas as pd
import numpy as np

data = {'First_Name': ['John', 'Marano', 'David', np.nan, 'Robert'],
        'Middle_Name': ['R', np.nan, np.nan, np.nan, np.nan],
        'Last_Name': ['Rovin', 'Justine', 'Rose', np.nan, np.nan]}
df = pd.DataFrame(data)

score 0 · Accepted Answer · answered Mar 18 '22 at 09:58

0

if nan always appear the same in Full_Name column you can use this:

df['Full_Name (Header)'] = df['Full_Name (Header)'].str.replace('nan', ' ')

answered Mar 18 '22 at 09:58

Ali Halataee

68
7

nan is in str, this change worked fine for me. Thank you @Ali Halataee – kishore Mar 18 '22 at 10:22

score 0 · Answer 3 · answered Mar 18 '22 at 10:13

Use Series.dropna for remove misisng values before join:

cols = ['First_Name', 'Middle_Name', 'Last_Name']
df['Full_Name'] = df[cols].apply(lambda x: ' '.join(x.dropna()), axis=1)
print(df)

  First_Name Middle_Name Last_Name       Full_Name
0       John           R     Rovin    John R Rovin
1     Marano         NaN   Justine  Marano Justine
2      David         NaN      Rose      David Rose
3        NaN         NaN       NaN                
4     Robert         NaN       NaN          Robert

Need to remove nan from a column when its concatenated with strings using pandas

3 Answers3