I have this dataset with 60000 rows where there are 14000 duplicates rows (based on the id which is CODE_2011). And I want to keep only unique observation. The criteria to terminate the duplicate row is "if there are two rows with same CODE_2011 then keep the row with the highest area). I am really new to programming and really struggling to write a loop for this. Can someone help me with this?
For example row 55 and 56 have the same id but different area. So the goal is to keep row 56 and remove 55 as the area of 56 is greater than 55.