0

My data looks like this:

First_Name Middle_Name Last_Name Full_Name (Header)
John       R           Rovin     John R Rovin
Marano                 Justine   Marano nan Justine
David                  Rose      David nan  Rose
                                 nan nan nan
Robert                           Robert nan nan

I am trying to trim nan from Full_Name column to just get whatever name it possibly contains as is. When I am trying to use Fillna(' ') , its not helping me to trim but its completely removing the column content. My final DF should look something like this:

First_Name Middle_Name Last_Name Full_Name (Header)

John       R           Rovin     John R Rovin
Marano                 Justine   Marano Justine
David                  Rose      David Rose
                                 
Robert                           Robert

I am heavily dependent for most of the operations on Pandas. So is there any way using pandas I can solve this problem?

kishore
  • 51
  • 4
  • What is your code for join columns? – jezrael Mar 18 '22 at 09:52
  • check this post for removing nan substrings: https://stackoverflow.com/questions/13682044/remove-unwanted-parts-from-strings-in-a-column or this post for joining the columns without having nan in the first place: https://stackoverflow.com/questions/33158417/pandas-combine-two-strings-ignore-nan-values – amy989 Mar 18 '22 at 09:54

3 Answers3

1

You can use:

cols = ['First_Name', 'Middle_Name', 'Last_Name']
df['Full_Name'] = df[cols].apply(lambda x: ' '.join(i for i in x if pd.notna(i)), axis=1)
print(df)

# Output
  First_Name Middle_Name Last_Name       Full_Name
0       John           R     Rovin    John R Rovin
1     Marano         NaN   Justine  Marano Justine
2      David         NaN      Rose      David Rose
3        NaN         NaN       NaN                
4     Robert         NaN       NaN          Robert

Setup a MRE

import pandas as pd
import numpy as np

data = {'First_Name': ['John', 'Marano', 'David', np.nan, 'Robert'],
        'Middle_Name': ['R', np.nan, np.nan, np.nan, np.nan],
        'Last_Name': ['Rovin', 'Justine', 'Rose', np.nan, np.nan]}
df = pd.DataFrame(data)
Corralien
  • 109,409
  • 8
  • 28
  • 52
0

if nan always appear the same in Full_Name column you can use this:

df['Full_Name (Header)'] = df['Full_Name (Header)'].str.replace('nan', ' ')
0

Use Series.dropna for remove misisng values before join:

cols = ['First_Name', 'Middle_Name', 'Last_Name']
df['Full_Name'] = df[cols].apply(lambda x: ' '.join(x.dropna()), axis=1)
print(df)

  First_Name Middle_Name Last_Name       Full_Name
0       John           R     Rovin    John R Rovin
1     Marano         NaN   Justine  Marano Justine
2      David         NaN      Rose      David Rose
3        NaN         NaN       NaN                
4     Robert         NaN       NaN          Robert
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252