0

I have a dataframe df1 like below -

|email_id| date |
|abc@gmail.com | ['2022-04-09'] |
|ABC@gmail.com | [nan]
|def@gmail.com | ['2022-09-21','2022-03-09'] |
|ghi@gmail.com | [nan, '2022-03-29'] |
|jkl@gmail.com | [nan] |
|mnp@gmail.com | [nan,'2022-09-01']

Another df df2 -

|email_id| status |
|abc@gmail.com | 0 |
|def@gmail.com | 0 |
|ghi@gmail.com | 0 |
|mno@gmail.com | 3 |
|pqr@gmail.com | 2 |
|MNP@gmail.com | 1 |

How can I lookup email_id from df2 in df1 and update the status in df2? If we have the date values present in df1 date column , status for that email_id should be 0, and if we have any nan values present, the status should be 1. If some email_id from df2 doesn't match in df1 , will keep the status as same.

Expected output of df2 -

|email_id| status |
|abc@gmail.com | 1 |
|def@gmail.com | 0 |
|ghi@gmail.com | 1 |
|mno@gmail.com | 3 |
|pqr@gmail.com | 2 |
|MNP@gmail.com | 1 |

Please help me out. Thanks in advance!

deepu2711
  • 59
  • 1
  • 7

1 Answers1

1

First use DataFrame.explode for column from lists, then create compare for missing values with aggregate max for mapping Series, use Series.map with replace non matched values to original column df2['status']:

df = df1.explode('date')
s = df['date'].isna().astype(int).groupby(df['email_id'].str.lower()).max()
print (s)
email_id
abc@gmail.com    1
def@gmail.com    0
ghi@gmail.com    1
jkl@gmail.com    1
mnp@gmail.com    1
Name: date, dtype: int32

df2['status'] = df2['email_id'].str.lower().map(s).fillna(df2['status']).astype(int)
print (df2)
        email_id  status
0  abc@gmail.com       1
1  def@gmail.com       0
2  ghi@gmail.com       1
3  mno@gmail.com       3
4  pqr@gmail.com       2
5  MNP@gmail.com       1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • how can i convert the datatype of status from to python standard type int64 ? Can you please help? – deepu2711 Nov 04 '22 at 17:03
  • @deepu2711 what is reason? ` ` is for Series, `int64`is for value of same Series. – jezrael Nov 04 '22 at 17:09
  • I need to insert these values to Postgres DB, Please check this post - https://stackoverflow.com/questions/74314929/update-table-values-in-postgres-table-using-psycopg2-in-python?noredirect=1#comment131207069_74314929 – deepu2711 Nov 04 '22 at 17:11
  • not idea, maybe `int(df2['status'][i])` – jezrael Nov 04 '22 at 17:24
  • can you please explain how the `s` variable is being created in the solution above, how max() is giving 0,1 status in the df? @jezrael – deepu2711 Nov 09 '22 at 14:24