I've got an interesting one for you all.
I'm looking to first: Look through the ID column and identify duplicate values. Once those are identified, the code should go through the income of the duplicated values and keep the row with the larger income.
So if there are three ID values of 2, it will look for the one with the highest income and keep that row.
ID Income
1 98765
2 3456
2 67
2 5498
5 23
6 98
7 5645
7 67871
9 983754
10 982
10 2374
10 875
10 4744
11 6853
I know its as easy as subsetting based on a condition, but I don't know how to remove the rows based on if the income in one cell is greater than the other.(Only done if the id's match)
I was thinking of using an ifelse statement to create a new column to identify duplicates (through subsetting or not) then use the new column's values to ifelse again to identify the larger income. From there I can just subset based on the new columns I have created.
Is there a faster, more efficient way of doing this?
The outcome should look like this.
ID Income
1 98765
2 5498
5 23
6 98
7 67871
9 983754
10 4744
11 6853
Thank you