0

I have this dataset with 60000 rows where there are 14000 duplicates rows (based on the id which is CODE_2011). And I want to keep only unique observation. The criteria to terminate the duplicate row is "if there are two rows with same CODE_2011 then keep the row with the highest area). I am really new to programming and really struggling to write a loop for this. Can someone help me with this?

For example row 55 and 56 have the same id but different area. So the goal is to keep row 56 and remove 55 as the area of 56 is greater than 55.

Here is a glimpse

Fuser
  • 47
  • 1
  • 9
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pictures of data are not helpful because we can not copy/paste the data. – MrFlick Feb 18 '20 at 17:34
  • Yes see https://stackoverflow.com/a/35469576/80626 – Kent Johnson Feb 18 '20 at 17:53

0 Answers0