Use pd.merge(). See this example:
df_a = pd.DataFrame({'subject_id': ['1', '2', '3', '4', '5'],
'first_name': ['Alex', 'Amy', 'Allen', 'Alice', None]} ,
columns = ['subject_id', 'first_name'])
df_b = pd.DataFrame({'subject_id': ['4', '5', '6', '7', '8'],
'first_name': [None, 'Brian', 'Bran', 'Bryce', 'Betty']},
columns = ['subject_id', 'first_name'])
Using pd.merge()
do the outer join on 'subject_id':
df_merge = pd.merge(df_a, df_b, on='subject_id', how='outer')
and you get:
subject_id first_name_x first_name_y
0 1 Alex NaN
1 2 Amy NaN
2 3 Allen NaN
3 4 Alice None
4 5 None Brian
5 6 NaN Bran
6 7 NaN Bryce
7 8 NaN Betty
You can use where
to conditionally assign a value from the _x and _y columns:
df_merge['first_name'] = df_merge['first_name_x'].where(df_merge['first_name_x'].notnull(), df_merge['first_name_y'])
Using .drop()
method to delete _x and _y columns:
df_res = df_merge.drop(['first_name_x', 'first_name_y'], axis=1)
And get the result you are looking for:
print(df_res)
subject_id first_name
0 1 Alex
1 2 Amy
2 3 Allen
3 4 Alice
4 5 Brian
5 6 Bran
6 7 Bryce
7 8 Betty
If you don't know the name of the columns that will be generated from the merge, you can use this code:
suffixes = ('__x', '__y')
df_merge = pd.merge(df_a, df_b, on='subject_id', how='outer', suffixes=suffixes)
drop_cols = []
for col in df_merge.columns:
if col.endswith(suffixes[0]):
new_col = col[:-len(suffixes[0])]
df_merge[new_col] = df_merge[col].where(df_merge[col].notnull(), df_merge[new_col + suffixes[1]])
drop_cols.extend([col, new_col + suffixes[1]])
df_merge.drop(columns=drop_cols, inplace=True)
Note: For a more detailed explanation you can refer to the official documentation, while here you will find many examples.