1

Dataframe One

    number   email          name
0   1234     me@mail.com    Me
1   5678     you@mail.com   You
2   9012     us@mail.com    Us
3   3456     them@mail.com  Them

Dataframe Two

    email         name    open  click
0   you@mail.com  You     31    7  
1   them@mail.com Them    84    15
2   me@mail.com   Me      6     1
3   us@mail.com   Us      0     4

I would like to combine the two dfs so I end up with one df only that combines the two:

Desired Output:

    number   email          name   open  click
0   1234     me@mail.com    Me     6     1
1   5678     you@mail.com   You    31    7
2   9012     us@mail.com    Us     0     4
3   3456     them@mail.com  Them   84    15

What is confusing me is how to ensure the data in columns 'open' and 'click' from dataframe two matches up correctly when combined with dataframe one as the 'email' and 'name' columns are in a different order in each dataframe.

Ed Jefferies
  • 165
  • 9

1 Answers1

1

If you mean by a different order, that the rows do not match at the indexes of the second dataframe that you gave as an example. Just merge by email.

df3 = pd.merge(df1, df2, on= 'email',  suffixes=('', '_y')).filter(regex='^(?!.*_y)')

Or do a left-side merge on the left of the first dataframe and on the right of the second. And if there are no rows to merge in the second dataframe, then there will be empty rows, and the rows of the first will be all.

df3 = pd.merge(df1, df2, left_on='email', right_on='email', how='left',
               suffixes=('', '_y')).filter(regex='^(?!.*_y)')

Update

If I understand you correctly. You need to merge, but you have different values in the open, click columns. Made two dataframes. So that there are no duplicate columns, I use the prefix '_y' for the second duplicate and filter the duplicate filter(regex='^(?!.*_y)' on it. To save the necessary columns, I renamed them (so that they do not fall into duplicates).

df3 = pd.DataFrame(
    {'number': [1234, 5678, 9012, 3456], 'email': ['me@mail.com', 'you@mail.com', 'us@mail.com', 'them@mail.com'],
     'name': ['Me', 'You', 'Us', 'Them'], 'open': [6, 31, 0, 84], 'click': [1, 7, 4, 15]})

df4 = pd.DataFrame(
    {'number': [3456, 5678, 9012, 1234], 'email': ['them@mail.com', 'you@mail.com', 'us@mail.com', 'me@mail.com'],
     'name': ['Them', 'You', 'Us', 'Me'], 'open': [1, 2, 3, 4], 'click': [4, 3, 2, 1]})


df3.rename(columns={'open': ' open_df3', 'click': 'click_df3'}, inplace=True)
df4.rename(columns={'open': ' open_df4', 'click': 'click_df4'}, inplace=True)

df5 = pd.merge(df3, df4, left_on='email', right_on='email', how='left',
               suffixes=('', '_y')).filter(regex='^(?!.*_y)')
inquirer
  • 4,286
  • 2
  • 9
  • 16
  • 1
    Thank you! If I wanted to add another df (call it df4) to df3 afterwards - df4 is in same format to df2 but with different values in the 'Open' and 'Click' columns - how would I merge it so that 'Open' and 'Click' were the only columns that merge? So effectively I would be taking df3 + df4 and wanting to end up with columns that are: Number/Email/Name/Open/Click/Open/Click ? – Ed Jefferies Apr 19 '23 at 13:44
  • @EdJefferies see updates. If it's not then show df4 and the expected result. – inquirer Apr 19 '23 at 14:32