I have an problem in counting the number of combinations of drugs. My data is organized in two data frames. df1 contains id and found drugs, as such:
ID | drug
-----------
1 | drug1
1 | drug2
1 | drug3
2 | drug3
2 | drug5
3 | drug1
3 | drug3
3 | drug4
3 | drug5
df2 shows all possible drug combination of 2 different drugs, as such:
combi1 | combi2
-----------------
drug1 | drug2
drug1 | drug3
drug1 | drug4
drug2 | drug3
drug2 | drug4
drug2 | drug5
With 7140 possible combinations in total. What I want is to find out how many IDs have combination drug1-drug2
, drug1-drug3
and so forth.
I have been trying a double for
loop:
counter=0
for(com in 1:nrow(df2)){
for(id in 1:unique(df1$ID)){
if(df2$combi1[com] %in% df1$drug[id] & df2$combi2[com] %in% df1$drug[id]) {
counter=counter+1
}
}
df2$count=counter
counter=0
}
But it doesn't work, because it is only able to check one row at a time. I have also tried the solution in Find Most Frequent Combination within a Vector by Group, but without any luck.
Furthermore, I need to do the same with combinations of three drugs
EDIT: I like the output to be like this in df2, where I can see, how many times drug1 and drug2 has occurred as combination within ID. For example, only one ID had both drug1 and drug2, whereas 2 IDs had drug1 and drug3
combi1 | combi2 | count
-----------------------
drug1 | drug2 | 1
drug1 | drug3 | 2
drug1 | drug4 | 0
drug2 | drug3 | 1
drug2 | drug4 | 0
drug2 | drug5 | 0