0

I am working with 2 pandas dataframes. They are linked by the MessageID. I would normally use a simple map, but if there are multiple hashes for the MessageID I would like to create a new row for each additional one. If there are multiple senders or recipients I would like additional rows as well. Is there a clever way to accomplish this.

df1
    MessageID   From      To            Outcome
0   105176627   sender1@  recipient1@   aborted
1   105176660   sender2@  recipient2@   delivered          
2   105176688   sender3@  recipient2@   delivered
3   105176688   sender3@  recipient3@   delivered
4   105176692   sender5@  recipient6@   aborted
5   105176858   sender4@  recipient6@   aborted      

df2
    MessageID   FileName        FileHash
0   105176627   you.pdf         4a352399b53c6a4a7f513b1ac8abb95761b0754e7be8
1   105176660   image002.png    c732a64515721f261579f74c77e307b07ba2ed4a6674
2   105176660   image004.png    28a5d5ba70ce941d1632f2f5366f6a13153c269ff1f1
3   105176660   image005.jpg    c9fb49a33f18924a02f550890c2e3b1d3148e5c585e9
4   105176688   image001.jpg    e663cb7d8c59bf591de64660822e9e4087d14f70de55
5   105176692   image001.png    d2cdcc83765b564d2fa6ba7a671c4efa21434c974b92
6   105176858   image.pdf       a2a6579d54edc2451b4ee2b47743e858be6b8c5452c2

Desired output:

df3
    MessageID   From      To           Outcome   FileName     FileHash
0   105176627   sender1@  recipient1@  aborted   you.pdf      4a352399b53c6a4a7f513b1ac8abb95761b0754e7be8
1   105176660   sender2@  recipient2@  delivered image002.png c732a64515721f261579f74c77e307b07ba2ed4a6674
2   105176660   sender2@  recipient2@  delivered image004.png 28a5d5ba70ce941d1632f2f5366f6a13153c269ff1f1
3   105176660   sender2@  recipient2@  delivered image005.jpg c9fb49a33f18924a02f550890c2e3b1d3148e5c585e9
4   105176688   sender3@  recipient2@  delivered image001.jpg e663cb7d8c59bf591de64660822e9e4087d14f70de55
5   105176688   sender3@  recipient3@  delivered image001.jpg e663cb7d8c59bf591de64660822e9e4087d14f70de55
6   105176692   sender5@  recipient6@   aborted  image001.png d2cdcc83765b564d2fa6ba7a671c4efa21434c974b92
7   105176858   sender4@  recipient6@   aborted  image.pdf    a2a6579d54edc2451b4ee2b47743e858be6b8c5452c2
sectechguy
  • 2,037
  • 4
  • 28
  • 61
  • 2
    As suggested by @jezrael, similar thing have been asked before. `df1.merge(df2, how='outer', on = 'MessageID')` should do the trick. Adding here just is csae you don't have time to research, but suggest going through the link put above. – moys Sep 10 '19 at 13:23
  • I didnt realize it was a simple as that, thank you @SH-SF – sectechguy Sep 10 '19 at 13:28

0 Answers0