0

I have two pandas dataframes: The first one contains reviews with different words as below :

import pandas as pd
df = pd.DataFrame({
    "review_num": [2,2,2,1,1,1,1,1,3,3],
    "review": ["The second review","The second review","The second review",
               "This is the first review","This is the first review",
               "This is the first review","This is the first review",
               "This is the first review",'No Noo', 'No Noo'],
    "token_num":[1,2,3,1,2,3,4,5,1,2],
    "token":["The","second","review","This","is","the","first","review","No","Noo"],
    "score":[0.3,-0.6,0.4,0.5,0.6,0.7,-0.6,0.4,0.5,0.6]
})
print(df)

   review_num                    review  token_num   token  score
0           2         The second review          1     The    0.3
1           2         The second review          2  second   -0.6
2           2         The second review          3  review    0.4
3           1  This is the first review          1    This    0.5
4           1  This is the first review          2      is    0.6
5           1  This is the first review          3     the    0.7
6           1  This is the first review          4   first   -0.6
7           1  This is the first review          5  review    0.4
8           3                    No Noo          1      No    0.5
9           3                    No Noo          2     Noo    0.6

I drop some lines and I get df2:

df2 = df.drop(df.groupby('review_num',sort=False)['score'].idxmax())
print(df2)

   review_num                    review  token_num   token  score
0           2         The second review          1     The    0.3
1           2         The second review          2  second   -0.6
3           1  This is the first review          1    This    0.5
4           1  This is the first review          2      is    0.6
6           1  This is the first review          4   first   -0.6
7           1  This is the first review          5  review    0.4
8           3                    No Noo          1      No    0.5

And the second dataframe to merge is "df_histo".it contains old_review and new_modified_review:

   review_num                   review           new_modified_review
           2         The second review                          XXXX
           1  This is the first review                          YYYY
           3                    No Noo                           ZZZ

I merge df2 and df_histo using the code :

df_merged=df2.merge(df_histo, on='review_num', how='inner')
print(df_merged)

Values are all ok and lines but I lost indexes.

I get indexes: 0,1,2,3,4,5,6 Expected : 0,1,3,4,6,7,8 (like df2)

SLA
  • 87
  • 6

0 Answers0