0

Im about to modify a dataframe because it includes double values

Data Frame:                                           
Id Name Account                                                    
1    X    1                                       
1    Y    2                                             
1    Z    3                                 
2    J    1                                                
2    T    4                                                 
3    O    2

So when there are multiple rows with same Id I just want to keep the last row. The desired output would be

Id Name Account                                                                                             
1    Z    3                                                                          
2    T    4                                                 
3    O    2

This is my current Code:

 for (i in 1:(nrow(mylist)-1)) {

    if(mylist$Id[c(i)] == mylist$Id[c(i+1)]){
      
      mylist <- mylist[-c(i), ]
      
      
    } 
  }

I have Problems when a row is removed because all other rows get a lower index and the System skips rows in the next step.

Cettt
  • 11,460
  • 7
  • 35
  • 58
Raphael
  • 1
  • 1

2 Answers2

1

You can do this easily with the dplyr package:

library(dplyr)

mylist %>%
 group_by(Id) %>%
 slice(n()) %>%
 ungroup()

First you group_by the Id column. Afterwards you select only the last entry (slice(n())) of each group.

Cettt
  • 11,460
  • 7
  • 35
  • 58
  • Thanks :) I added it in my code with mylist <- (then your part). When I apply that lines with Ctrl + Enter it works and the list gets shorter. When I start the whole programm somehow the slice isnt applied and the data is as big as before. There is no Error :/ – Raphael Jul 03 '20 at 18:02
1

One option in Base-R is

mylist[cumsum(sapply(split(mylist,mylist$Id),nrow)),]

  Id Name Account
3  1    Z       3
5  2    T       4
6  3    O       2
Daniel O
  • 4,258
  • 6
  • 20