My question is to improve the efficiency/elegance of my code. I have a df with a list of drugs. I want to identify the drugs that start with C09 and C10. If a person has these drugs, I want to give them a binary indicator (1=yes, 0=no) of whether they have these drugs. Binary indicator will be in a new column called "statins", in the same dataframe. I used this post as a guide: What's the R equivalent of SQL's LIKE 'description%' statement?.
Here is what I have done;
names<-c("tom", "mary", "mary", "john", "tom", "john", "mary", "tom", "mary", "tom", "john")
drugs<-c("C10AA05", "C09AA03", "C10AA07", "A02BC01", "C10AA05", "C09AA03", "A02BC01", "C10AA05", "C10AA07", "C07AB03", "N02AA01")
df<-data.frame(names, drugs)
df
names drugs
1 tom C10AA05
2 mary C09AA03
3 mary C10AA07
4 john A02BC01
5 tom C10AA05
6 john C09AA03
7 mary A02BC01
8 tom C10AA05
9 mary C10AA07
10 tom C07AB03
11 john N02AA01
ptn = '^C10.*?'
get_statin = grep(ptn, df$drugs, perl=T)
stats<-df[get_statin,]
names drugs
1 tom C10AA05
3 mary C10AA07
5 tom C10AA05
8 tom C10AA05
9 mary C10AA07
ptn2='^C09.*?'
get_other=grep(ptn2, df$drugs, perl=T)
other<-df[get_other,]
other
names drugs
2 mary C09AA03
6 john C09AA03
df$statins=ifelse(df$drugs %in% stats$drugs,1,0)
df
names drugs statins
1 tom C10AA05 1
2 mary C09AA03 0
3 mary C10AA07 1
4 john A02BC01 0
5 tom C10AA05 1
6 john C09AA03 0
7 mary A02BC01 0
8 tom C10AA05 1
9 mary C10AA07 1
10 tom C07AB03 0
11 john N02AA01 0
df$statins=ifelse(df$drugs %in% other$drugs,1,df$statins)
df
names drugs statins
1 tom C10AA05 1
2 mary C09AA03 1
3 mary C10AA07 1
4 john A02BC01 0
5 tom C10AA05 1
6 john C09AA03 1
7 mary A02BC01 0
8 tom C10AA05 1
9 mary C10AA07 1
10 tom C07AB03 0
11 john N02AA01 0
So, I can get what I want - but I feel there is probably a better, nicer way to do it and would appreciate any guidance here. An obvious solution that I can feel you all shouting at your screens is just use '^C' as a pattern - and therefore catch all the drugs beginning with C. I won't be able to do this in my main analysis as the 'C' will catch things that I don't want in some instances, so I need to make it as narrow as possible.