0

I will do my best to make my question as clear as possible :)

I have 2 dataframes which are basically emails:

  • df1 = all received emails
  • df2 = all sent emails

Both dataframes have the same structure:

mailbox; subject; received date; received time

What I am trying to achieve:

  • Get the value of the column df2['recevied time'] where df2['subject'].str.contains(df1['subject'])

So far I have managed to identify replies but only by saying true or false using the following:

df1.assign(indf2 = df2['subject'].isin(df2['subject']))

Samples:

df1 = pd.DataFrame({'mailbox': ['perso', 'perso', 'perso'], 
               'subject': ['Christmas time is almost here', 'python is the way', 'java is old school'],
                'emaildate' : ['23.12.2019', '02.01.2020', '11.11.2020']})

df2 = pd.DataFrame({'mailbox': ['perso', 'perso', 'perso'], 
               'subject': ['re:  easter is gone', 're: python is the way', 
               're: re: toward mongoDb'],
                'emaildate' : ['24.12.2019', '16.01.2020', '05.11.2020']})

The output should be:

df3 = pd.DataFrame({'mailbox': ['perso', 'perso', 'perso'], 
               'subject': ['Christmas time is almost here', 'python is the way', 'java is old school'],
                'emaildate' : ['23.12.2019', '02.01.2020', '11.11.2020'
                  replied_date:['24.12.2019', '16.01.2020', 'noReplyFound'] })

So basically it would be an additional dataframe with the values from df1 plus the information from df2 when available.

somehow I can feel that I am missing a very small element but I cant figure it out :(

Many thanks for your help

  • Can you give two small samples of your dataframes, then it is much easier to help. See here: https://stackoverflow.com/help/minimal-reproducible-example – oskros Nov 17 '20 at 08:16
  • See specifically [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) about how to make a reproducible `pandas` example. – BioGeek Nov 17 '20 at 08:19
  • Not sure, but maybe left join is necessary here. – jezrael Nov 17 '20 at 08:20

0 Answers0