df with imperfect data:
df = pd.DataFrame({'A Surname' : ['Smith', 'Longshore', 'Jones'],
'A Title': ['Mr', 'Miss', np.nan],
'B Surname' : ['Smith', np.nan, 'Nguyen'],
'B Title': ['Mrs', np.nan, np.nan]})
I'm after a column containing a string appropriate to address both A and B where possible. Combining fields returns np.nan if any are np.nan, and it needs to be logical (eg. don't use 'B Title' if 'B Surname' is np.nan), so I need a series of rules to determine the most appropriate combination. My unsuccessful approach:
def combined(x):
full = df['A Title'] + ' ' & df['A Surname'] & ' & ' & df['B Title'] & ' ' & df['B Surname']
no_title = df['A Surname'] & ' & ' & df['B Surname']
# more combinations
if full != np.nan:
return full
elif no_title != np.nan:
return no_title
# more elifs
else:
return df['A Surname']
df['combined string'] = np.nan
df['combined string'] = df['combined string'].apply(combined)
The desired output would look like:
desired_df = pd.DataFrame({'A Surname' : ['Smith', 'Longshore', 'Jones'],
'A Title': ['Mr', 'Miss', 'Mr'],
'B Surname' : ['Smith', np.nan, 'Whatever'],
'B Title': ['Mrs', np.nan, np.nan],
'combined string': ['Mr Smith & Mrs Smith', 'Miss Longshore', 'Jones & Whatever']})
What's a practical way to do this?