I will do my best to make my question as clear as possible :)
I have 2 dataframes which are basically emails:
df1
= all received emailsdf2
= all sent emails
Both dataframes have the same structure:
mailbox; subject; received date; received time
What I am trying to achieve:
- Get the value of the column
df2['recevied time']
wheredf2['subject'].str.contains(df1['subject'])
So far I have managed to identify replies but only by saying true or false using the following:
df1.assign(indf2 = df2['subject'].isin(df2['subject']))
Samples:
df1 = pd.DataFrame({'mailbox': ['perso', 'perso', 'perso'],
'subject': ['Christmas time is almost here', 'python is the way', 'java is old school'],
'emaildate' : ['23.12.2019', '02.01.2020', '11.11.2020']})
df2 = pd.DataFrame({'mailbox': ['perso', 'perso', 'perso'],
'subject': ['re: easter is gone', 're: python is the way',
're: re: toward mongoDb'],
'emaildate' : ['24.12.2019', '16.01.2020', '05.11.2020']})
The output should be:
df3 = pd.DataFrame({'mailbox': ['perso', 'perso', 'perso'],
'subject': ['Christmas time is almost here', 'python is the way', 'java is old school'],
'emaildate' : ['23.12.2019', '02.01.2020', '11.11.2020'
replied_date:['24.12.2019', '16.01.2020', 'noReplyFound'] })
So basically it would be an additional dataframe with the values from df1 plus the information from df2 when available.
somehow I can feel that I am missing a very small element but I cant figure it out :(
Many thanks for your help