0

Let's say I take a sample of names such as these separated by individual fields:

indx  First Name   Middle Name     Last Name
0     CHARITIXAN   K.R.,           NICHOLS
1           None   Johnny-Boy      CHAVEZ
2          ISAAC   None            ESPARZA
3        MICHAEL   nan             
4         Andrew                   Pfaff

Let's also assume these data are formatted as a pandas dataframe (df) and enough cleaning (via the .replace method) has been done to where all values that remain are only occupied strings or empty strings.

indx  First Name   Middle Name     Last Name
0     CHARITIXAN   K.R.,           NICHOLS
1                  Johnny-Boy      CHAVEZ
2          ISAAC                   ESPARZA
3        MICHAEL               
4         Andrew                   Pfaff

I want to properly combine all part of a given name with ONLY a single space between each name segment. Based on my research and implementation, the best solution I found was this - the one were re is used. Is this the optimal way or is there something better for this particular case?

My final approach was this:

df['full_name']=df[['First Name', 'Middle Name', 'Last Name']].apply(lambda x: re.sub(' +', ' ', ' '.join(x)), axis=1)
nate
  • 440
  • 2
  • 8
  • 18
  • 1
    can't you just add them together `df['full_name']=df['First Name'] +' ' + df['Middle Name'] + ' ' + df['Last Name']` – Kenan Jan 20 '20 at 14:47
  • 1
    @kenan that's not "ONLY a single space" if middle or last name are empty. – anishtain4 Jan 20 '20 at 14:52
  • assuming names is a list of your columns `df[names].apply(lambda x : x.str.cat(sep=' '),axis=1)` – Umar.H Jan 20 '20 at 15:07

2 Answers2

4

You can apply join as:

df['full_name'] = df[['First Name','Middle Name', 'Last Name']].apply(lambda x: ' '.join(x), axis=1)
anishtain4
  • 2,342
  • 2
  • 17
  • 21
1

You can use this

df['full_name'] = df.apply(lambda row: row['First Name'] + ' ' + row['Middle Name'] + ' ' + row['Last Name'], axis=1)