I have two pandas dataframes: The first one contains reviews with different words as below :
import pandas as pd
df = pd.DataFrame({
"review_num": [2,2,2,1,1,1,1,1,3,3],
"review": ["The second review","The second review","The second review",
"This is the first review","This is the first review",
"This is the first review","This is the first review",
"This is the first review",'No Noo', 'No Noo'],
"token_num":[1,2,3,1,2,3,4,5,1,2],
"token":["The","second","review","This","is","the","first","review","No","Noo"],
"score":[0.3,-0.6,0.4,0.5,0.6,0.7,-0.6,0.4,0.5,0.6]
})
print(df)
review_num review token_num token score
0 2 The second review 1 The 0.3
1 2 The second review 2 second -0.6
2 2 The second review 3 review 0.4
3 1 This is the first review 1 This 0.5
4 1 This is the first review 2 is 0.6
5 1 This is the first review 3 the 0.7
6 1 This is the first review 4 first -0.6
7 1 This is the first review 5 review 0.4
8 3 No Noo 1 No 0.5
9 3 No Noo 2 Noo 0.6
I drop some lines and I get df2:
df2 = df.drop(df.groupby('review_num',sort=False)['score'].idxmax())
print(df2)
review_num review token_num token score
0 2 The second review 1 The 0.3
1 2 The second review 2 second -0.6
3 1 This is the first review 1 This 0.5
4 1 This is the first review 2 is 0.6
6 1 This is the first review 4 first -0.6
7 1 This is the first review 5 review 0.4
8 3 No Noo 1 No 0.5
And the second dataframe to merge is "df_histo".it contains old_review and new_modified_review:
review_num review new_modified_review
2 The second review XXXX
1 This is the first review YYYY
3 No Noo ZZZ
I merge df2 and df_histo using the code :
df_merged=df2.merge(df_histo, on='review_num', how='inner')
print(df_merged)
Values are all ok and lines but I lost indexes.
I get indexes: 0,1,2,3,4,5,6 Expected : 0,1,3,4,6,7,8 (like df2)