0

I've a sample dataframe df1

id  user_id     name             email       
1     1        John         John@example.com
2     2        Alves        alves@example.com
3     3        Kristein     kristein@example.com
4     4        James        james@example.com

The second dataframe df2

id      user       user_email_1            user_email_2         status
1      Sanders     sanders@example.com                          active
2      Alves       alves111@example.com   alves@example.com     active
3      Micheal     micheal@example.com                          active
4      James       james@example.com                            delete

How can I add the status data from df2 to df1 if

user_id of df1 and id of df2

name of df1 and user of df2

email of df1 matches with user_email_1 or user_email_2 of df2 matches and drops the not matched records?

Desired Result df1:

id   user_id    name       email                status
2      2        Alves     alves@example.com     active
4      4        James     james@example.com     delete

For example:

As alves@example.com from df1 matches with user_email_2, it appended the status data.

Digital404
  • 31
  • 3
  • 1
    Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – Anurag Dabas Jul 30 '21 at 14:41
  • `df1.merge(df2,left_on=['user_id','name','email'],right_on=['id','user','user_email'],how='left').drop(['id_x','id_y','user'],1)` – Anurag Dabas Jul 30 '21 at 14:42

2 Answers2

-1

You should use merge

df1.merge(df2.reset_index(), how='inner', left_on=['name', 'email', 'id'], right_on=['user', 'user_email', 'index'])
alparslan mimaroğlu
  • 1,450
  • 1
  • 10
  • 20
-1

Rearrange your dataframe df2 to get only one user_email column then merge the two dataframes and keep wanted columns:

df2 = df2.set_index(['id', 'user', 'status']).stack() \
         .rename('user_email').reset_index()

out = pd.merge(df1, df2, left_on=['user_id', 'name', 'email'],
                         right_on=['id', 'user', 'user_email'],
                         suffixes=('', '2')) \
          [['id', 'user_id', 'name', 'email', 'status']]
>>> out

   id  user_id   name              email  status
0   2        2  Alves  alves@example.com  active
1   4        4  James  james@example.com  delete

Where is it using the column user_email_2

The columns user_email_1 and user_email_2 are stacked into one column. After the transformation, df2 looks like:

>>> df2

   id     user  status       level_3            user_email
0   1  Sanders  active  user_email_1   sanders@example.com
1   2    Alves  active  user_email_1  alves111@example.com
2   2    Alves  active  user_email_2     alves@example.com
3   3  Micheal  active  user_email_1   micheal@example.com
4   4    James  delete  user_email_1     james@example.com
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Where is it using the column `user_email_2` in your answer? – Digital404 Jul 30 '21 at 16:46
  • The columns `user_email_1` and `user_email_2` are stacked into one column. – Corralien Jul 30 '21 at 21:42
  • @Digital404. Is it what you expect? If yes and if the answer solved your problem, don't forget to accept answer please. – Corralien Aug 01 '21 at 22:27
  • No, we can't stack the email into a single column. I've mentioned empty in few cells, but it could be not empty. In that case, stacking wont help. – Digital404 Aug 02 '21 at 17:20
  • Are you sure. The output is what you expect. Can you show me when this doesn't work? If you want match `user_email_1` OR `user_email_2`, it doesn't matter if emails are in the same column. – Corralien Aug 02 '21 at 19:35