I'm trying to reproduce this code from python to R:
# Sort by user overall rating first
reviews = reviews.sort_values('review_overall', ascending=False)
# Keep the highest rating from each user and drop the rest
reviews = reviews.drop_duplicates(subset= ['review_profilename','beer_name'], keep='first')
and I've done this piece of code in R:
reviews_df <-df[order(-df$review_overall), ]
library(dplyr)
df_clean <- distinct(reviews_df, review_profilename, beer_name, .keep_all= TRUE)
The problem is that I'm getting with python 1496263 records and with R 1496596 records.
link to dataset: dataset
Can someone help me to see my mistakes?