I have a data.table
and need to do some cleaning for multiple values that should be unique -- a simplified example should help:
> DT
id type
1: 1 A
2: 2 X
3: 3 X
4: 3 G
5: 3 NA
6: 4 D
7: 5 NA
The issue I'm having is multiple values for "type" for the same ID, or:
> DT[id == 3]
id type
1: 3 X
2: 3 G
3: 3 NA
The source of the multiple values is irrelevant, but I'd like to clean this in a specific manner: take the last reported value unless it's an NA
. So the cleaning should result in a single row per ID, and in the example case would look like:
> DTclean
id type
1: 1 A
2: 2 X
3: 3 G
4: 4 D
5: 5 NA
My current approach is to sort DT
by type
, so that all the NA
's are first, and then use duplicated
-- this is acceptable, but I feel there is a better method, plus though not crucial, this doesn't always take the last reported value -- in the above case it takes X instead of G.
This is my current approach:
> setorder(DT, type)
> DTclean <- DT[!duplicated(id, fromLast = T)]
> DTclean
id type
1: 1 A
2: 2 X
3: 3 X
4: 4 D
5: 5 NA
Any ideas/help would be much appreciated!