Let's say I take a sample of names such as these separated by individual fields:
indx First Name Middle Name Last Name
0 CHARITIXAN K.R., NICHOLS
1 None Johnny-Boy CHAVEZ
2 ISAAC None ESPARZA
3 MICHAEL nan
4 Andrew Pfaff
Let's also assume these data are formatted as a pandas dataframe (df) and
enough cleaning (via the .replace
method) has been done to where all
values that remain are only occupied strings or empty strings.
indx First Name Middle Name Last Name
0 CHARITIXAN K.R., NICHOLS
1 Johnny-Boy CHAVEZ
2 ISAAC ESPARZA
3 MICHAEL
4 Andrew Pfaff
I want to properly combine all part of a given name with ONLY a single
space between each name segment. Based on my research and implementation,
the best solution I found was this - the one were re
is used. Is this the optimal way or is there
something better for this particular case?
My final approach was this:
df['full_name']=df[['First Name', 'Middle Name', 'Last Name']].apply(lambda x: re.sub(' +', ' ', ' '.join(x)), axis=1)