1

I have two dataframes that have the following columns : Phone, Email and Name

Dataframe1 has 20k in length, whereas dataframe2 has 1k length. I would like to fill the blanks in the Phone column in dataframe1 with the phone numbers in dataframe2 using the email as a match index between the two dataframes.

What is the best way to do this? I have tried combine_frist() and Merge() but combine_first() returns the value in the same row rather than the value that matches the email address. Merge() resulted in the same thing.

Am I wrong to think I need to set email as an index and then map phones to that index? I feel like this is correct but I simply do not know how to do this. Any help is appreciated! Thank you :)

Example :

In [1] 
import pandas as pd
df1 = pd.DataFrame({'Phone': [1, NaN, 3, 4, 5, NaN, 7], 
                   'Name': ['Bob', 'Jon', 'Iris', 'Jacob','Donald','Beatrice','Jane'],
                   'Email': ['bob@gmail.com','jon@gmail.com','iris@gmail.com','jacob@gmail.com','donald@gmail.com','beatrice@gmail.com','jane@gmail.cm'})

df2 = pd.DataFrame({'Phone': [2, 1, 3, 5],
                  'Name': ['Jon', 'Bob', 'Donald'],
                  'Email': ['jon@gmail.com','bob@gmail.com', 'donald@gmail.com'})
In [2]: df1 
Out [2]:
Phone  Name  Email
1      Bob   bob@gmail.com
NaN    Jon   jon@gmail.com
3      Iris  iris@gmail.com
4      Jac   jacob@gmail.com
5      Don   donald@gmail.com
NaN    Bea   beatrice@gmail.com
7      Jane  jane@gmail.com
x 20000 len
In [3]: df2
Out [3]:
Phone  Name  Email
2      Jon   jon@gmail.com
1      Bob   bob@gmail.com
6      Bea   beatrice@gmail.com
5      Don   donald@gmail.com
x 1100 len

What I've tried

In [4]: df3 = pd.merge(df1,df2, on="Email", how="left")
Out [4]:
Phone  Name  Email
1      Bob   bob@gmail.com
1      Jon   jon@gmail.com
3      Iris  iris@gmail.com
4      Jac   jacob@gmail.com
5      Don   donald@gmail.com
NaN    Bea   beatrice@gmail.com
7      Jane  jane@gmail.com

In [5]: df3 = df1.combine_first(df2)
Out [5]:
Phone  Name  Email
1      Bob   bob@gmail.com
1      Jon   jon@gmail.com
3      Iris  iris@gmail.com
4      Jac   jacob@gmail.com
5      Don   donald@gmail.com
NaN    Bea   beatrice@gmail.com
7      Jane  jane@gmail.com

What I would like it to look like:

In [6]: df3
Out [6]
1      Bob   bob@gmail.com
2      Jon   jon@gmail.com
3      Iris  iris@gmail.com
4      Jac   jacob@gmail.com
5      Don   donald@gmail.com
6      Bea   beatrice@gmail.com
7      Jane  jane@gmail.com
New Dev
  • 48,427
  • 12
  • 87
  • 129

1 Answers1

0

Constructing the data frame like so:

df1 = pd.DataFrame({'Phone': [1, NaN, 3, 4, 5, NaN, 7], 
                   'Name': ['Bob', 'Jon', 'Iris', 'Jacob','Donald','Beatrice','Jane'],
                   'Email': ['bob@gmail.com','jon@gmail.com','iris@gmail.com','jacob@gmail.com','donald@gmail.com','beatrice@gmail.com','jane@gmail.cm']})

df2 = pd.DataFrame({'Phone': [2, 1, 5, 6],
                  'Name': ['Jon', 'Bob', 'Donald', 'Beatrice'],
                  'Email': ['jon@gmail.com','bob@gmail.com', 'donald@gmail.com', 'beatrice@gmail.com']})

The merge gives:

>>> df1.merge(df2, on='Email', how='left')
   Phone_x    Name_x               Email  Phone_y    Name_y
0      1.0       Bob       bob@gmail.com      1.0       Bob
1      NaN       Jon       jon@gmail.com      2.0       Jon
2      3.0      Iris      iris@gmail.com      NaN       NaN
3      4.0     Jacob     jacob@gmail.com      NaN       NaN
4      5.0    Donald    donald@gmail.com      5.0    Donald
5      NaN  Beatrice  beatrice@gmail.com      6.0  Beatrice
6      7.0      Jane       jane@gmail.cm      NaN       NaN

Then reduce Phone over columns.

>>> df1.merge(df2, on='Email', how='left')[['Phone_x', 'Phone_y']].ffill(axis=1)
   Phone_x  Phone_y
0      1.0      1.0
1      NaN      2.0
2      3.0      3.0
3      4.0      4.0
4      5.0      5.0
5      NaN      6.0
6      7.0      7.0

Reassign the right-most column in that result - if output is assigned to result, access by result.iloc[:, -1] - as a new column to the original data frame.

ifly6
  • 5,003
  • 2
  • 24
  • 47