I'm traying to Select
only the data that is no duplicate in certain columns(ID, position1, position2 and Name). It doesn't matter that the hour be different, that row will be considered duplicate still. The problem is that I need to mantein the column Hour
| ID | position1 | position2| Hour | Name
---------------------------------------------------------
| ABD1 | 54340 | 33254| 10:42| Adam
| ABD1 | 54340 | 33254| 10:52| Adam
| TGF5 | 54545 | 54545| 11:25| Mike
| ABD1 | 84544 | 24424| 12:06| Adam
| AGF4 | 25466 | 45444| 10:53| Mike
| TGF5 | 44787 | 25847| 09:05| Rose
On the example, we can see that the rows 1 and 2 are duplicated on ID, position1, position2 and Name, so I want to eliminate one of them, doesn't matter wich one. I tryng to do this using
df <- sqldf('SELECT DISTINCT ID, position1, position2, Name FROM df')
The problem is that I lost the column Hour. How can I conser the same columns of df, but only eliminate the rows that are equals in certain columns.
This is the result that I need:
| ID | position1 | position2| Hour | Name
---------------------------------------------------------
| ABD1 | 54340 | 33254| 10:42| Adam
| TGF5 | 54545 | 54545| 11:25| Mike
| ABD1 | 84544 | 24424| 12:06| Adam
| AGF4 | 25466 | 45444| 10:53| Mike
| TGF5 | 44787 | 25847| 09:05| Rose