I'm trying very hard to break my C mold, as you'll see, it's still present in my R code. I know there will be a smart R way of doing this!
Trying to essentially go through a long list of individuals held in a DF. Each individual can have multiple rows in this if they have taken more than one particular drug or even multiple instances of the same drug. Per row there is a drug name entry. Similar to:
patientID drugname
1 A
2 A
2 B
3 C
3 C
4 A
I have a list containing the unique drug names from this DF (A, B, C). I would like to build a dataframe with columns drugname and drugCount. In the drugCount I want to count up the number of unique instances a drug was prescribed but not multiple counts per person, more of a binary operation of "was this drug given to person X?".
A start of an attempt using a very C-style manner:
uniqueDrugList <- unique(therapyDF$prodcode)
numDrugs <- length(uniqueDrugList)
prevalenceDF <-as.data.frame(drugName=character(numDrugs),drugcount=integer(numDrugs),prevalence=numeric(numDrugs),stringsAsFactors=FALSE)
for(i in 1:length(idList)) {
individualDF <- subset(therapyDF,therapyDF$patid==idList[[i]])
for(j in 1:numDrugs) {
if(uniqueDrugList[[j]] %in% individualDF%prodcode) {
prevalenceDF <---- some how tally up here
}
}
Firstly, I take a subset of my main DF by identifying each individual with a particular ID for a list of unique IDs. Then, for each unique drug (and this is where it is slow), I want to see whether that drug is present in that individual's records. I would like to increment a 1 to an entry if present, else moves on to the next individual's subset.
Expected output
drugname count
A 3
B 1
C 1