I have data that looks like this (approximately 4 million lines):
ID CODE DATE
A567001 F11 1/1/2019
A567001 T67 1/1/2019
A567001 P09 1/5/2019
A567001 F11 1/7/2019
A568002 F11 1/9/2019
A568002 A56 1/9/2019
A567891 C45 1/7/2019
A002456 F11 1/10/2019
A002456 H09 1/11/2019
A021324 F11 1/11/2019
A021324 G65 1/10/2019
B125983 F11 1/9/2019
C172749 H76 1/8/2019
... ... ...
What I want to do is if a particular ID has 1 or more instances of 'F11' in the "code" column, then I want to keep all instances of that ID (not just the line that has 'F11' in "code"). Otherwise, I want to remove the entire entry from the database and be left with only those IDs that have at least one F11.
In other words, this is the result I would like to have (removed two entries):
ID CODE DATE
A567001 F11 1/1/2019
A567001 T67 1/1/2019
A567001 P09 1/5/2019
A567001 F11 1/7/2019
A568002 F11 1/9/2019
A568002 A56 1/9/2019
A002456 F11 1/10/2019
A002456 H09 1/11/2019
A021324 F11 1/11/2019
A021324 G65 1/10/2019
B125983 F11 1/9/2019
... ... ...
I am assuming I can use dplyr and tried this command:
placeholder <- mutate(Flag = ifelse(file1$icd10_code == 'F11\\.*',1,0) %>% group_by(file1$new_id) %>% mutate (max_flag = max(flag)))
but get this error:
Error in UseMethod("group_by_") :
no applicable method for 'group_by_' applied to an object of class "c('double', 'numeric')"