0

I'm trying to merge 2 dataframes by name where one column is name + last name and the other only first name initial + last name.

Example:

df1:

    name
John Doe

df2:

  name
J. Doe
mozway
  • 194,879
  • 13
  • 39
  • 75
Omar B
  • 45
  • 7

1 Answers1

1

You can use a regex to change the format of df1['name'] and feed this to merge:

df1.merge(df2,
          left_on=df1['name'].str.replace(r'^(.)\w+', r'\1.', regex=True),
          right_on='name'
         )

NB. check the documentation of merge for more parameters to format the output, in particular suffixes to customize the column names.

output:

     name    name_x  name_y
0  J. Doe  John Doe  J. Doe

example input:

df1 = pd.DataFrame({'name': ['John Doe']})
df2 = pd.DataFrame({'name': ['J. Doe']})
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Thanks! That helped. I was trying to do this for the last couple of days. The expected output was J. Doe. – Omar B May 02 '22 at 08:59