0

I have this dataset shown below

Name ID DATES R 1 @0CC 71476 20000704 11 2 @0CC 71476 20001204 11 3 @0RM 49960 20000131 2 4 @0RM 73565 20000919 1 5 @0RM 59451 20001023 1 6 @0RM 44457 20001214 1 7 @0TL 48061 20000627 31 8 @0TL 19824 20000929 3 9 @0TL 70970 20001211 1 10 @0TL 73862 20001212 2 11 @0TL 48061 20001227 31 12 @1AJ 58875 20001214 1 13 @1AJ 56014 20001214 3 14 @1AJ 47340 20001214 3 15 @1AJ 19813 20001214 3 16 @1AL 44416 20000303 31 17 @1AL 59184 20000413 323 18 @1AL 44416 20000517 31 19 @1AL 52718 20000621 1 20 @1AL 59184 20000707 323 21 @1AL 59184 20000801 323 22 @1AL 72832 20001127 43 23 @1AL 73568 20001130 3 24 @1AL 72832 20001211 43 25 @1FF 58781 20000719 1 26 @1FF 44505 20000801 12 27 @1FF 73559 20001110 1 28 @1FF 44505 20001218 12 29 @1FF 47276 20001227 3

What i'm trying to do is that for each unique name, and each unique ID, i would like to create a subset of this data frame, e.g

Name ID DATES R 1 @0CC 71476 20000704 11 3 @0RM 49960 20000131 2 4 @0RM 73565 20000919 1 5 @0RM 59451 20001023 1 6 @0RM 44457 20001214 1 7 @0TL 48061 20000627 31 8 @0TL 19824 20000929 3 9 @0TL 70970 20001211 1 10 @0TL 73862 20001212 2 12 @1AJ 58875 20001214 1 13 @1AJ 56014 20001214 3 14 @1AJ 47340 20001214 3 15 @1AJ 19813 20001214 3 16 @1AL 44416 20000303 31 17 @1AL 59184 20000413 323 19 @1AL 52718 20000621 1 22 @1AL 72832 20001127 43 23 @1AL 73568 20001130 3 25 @1FF 58781 20000719 1 26 @1FF 44505 20000801 12 27 @1FF 73559 20001110 1 29 @1FF 47276 20001227 3

I am thinking of using two for loops

for(i in unique(noanalysttest$IBTKR2)){
for(j in unique(noanalysttest$AMASKCD)){
R2<-subset(DT)
}
R2

But this doesn't give me the right solution. Much help is appreciated.

Thank you!

Donkeykongy
  • 135
  • 1
  • 8
  • @Frank thanks for pointing out the duplicate, i was reading through that thread and i tried ``df1[!duplicated(df1[c("Name", "ID")]),]`` which did not work but ``unique(setDT(df1), by = c("Name", "ID"))`` does after looking at @akrun solution. Thanks a lot for the help. – Donkeykongy Jul 24 '16 at 15:06

1 Answers1

2

We can use slice from dplyr after grouping by 'Name' and 'ID'

library(dplyr)
df1 %>% 
    group_by(Name, ID) %>%
    slice(1)

Or a base R option would be

df1[!duplicated(df1[c("Name", "ID")]),]

Or using data.table

library(data.table)
unique(setDT(df1), by = c("Name", "ID"))

Or as @Frank suggested

setDT(df1)[, .SD[1L], by = .(Name, ID)]
akrun
  • 874,273
  • 37
  • 540
  • 662