0

I have 2 dataframes like below :

df1 = pd.DataFrame(
{
    'sentence': ['text1', 'text2', 'text3', 'text1', 'text1', 'text2'],
    'label': ['abc', 'abc', 'abc', 'def', 'ghi', 'ghi']
}
)

df2 = pd.DataFrame(
{
    'sentence': ['html_text1', 'html_text2', 'html_text3', 'html_text4'],
    'label': ['abc', 'abc', 'def', 'ghi']
}
)

I want to iterate over the 2 dataframes and create a new dataframe. The condition for creating new dataframe is :

When label of df2 matches with label of df1, that record of df2 should be inserted above matching record of df1. So the final dataframe should look like :

enter image description here

P.S: I have not been able to work out the logic yet so I am not able to put sample code. However, I am trying to use dataframe.iterrows() to work on the above case.

Crusader
  • 313
  • 2
  • 7

1 Answers1

2

concat and sort_values with a stable sort:

out = (pd.concat([df2, df1])
         .sort_values('label', kind='stable', ignore_index=True)
         [['label', 'sentence']]
      )

output:

  label    sentence
0   abc  html_text1
1   abc  html_text2
2   abc       text1
3   abc       text2
4   abc       text3
5   def  html_text3
6   def       text1
7   ghi  html_text4
8   ghi       text1
9   ghi       text2

Ensuring df2 contains labels from df1:

out = (pd.concat([df2[df2['label'].isin(df1['label'])], df1])
         .sort_values('label', kind='stable', ignore_index=True)
         [['label', 'sentence']]
      )
mozway
  • 194,879
  • 13
  • 39
  • 75
  • 2
    This doesn't work if df2 contains a label that is not in df1. – tcotts Aug 26 '22 at 11:04
  • 2
    @tcotts this is not specified by OP, but easily handled, let me add an update – mozway Aug 26 '22 at 11:04
  • Thanks for the help @mozway. Can we have a solution even if df2 contains label not present in df1?? The final DF should contain all records from df2. Sorry, I forgot to put that in OP. – Crusader Aug 26 '22 at 11:53
  • @Crusader check my answer carefully, I provided 2 alternatives. If neither works for you, please provide an example demonstrating that and the expected output. – mozway Aug 26 '22 at 12:13
  • Ohh yes, The 1st alternative works. Can you explain which part of code is keeping "html_text" on top? I did not quite understand it. – Crusader Aug 26 '22 at 12:20
  • nvm, I got the code!!. Thanks!! – Crusader Aug 26 '22 at 12:30
  • 1
    @Crusader the fact that we `concat` with `df2` first **and** use a stable sort ;) – mozway Aug 26 '22 at 12:38