I have two data frames that look like this:
Table1:
Gender<-c("M","F","M","M","F")
CPTCodes<-c("15777, 19328, 19342, 19366, 19370, 19371, 19380","15777, 19357","19367, 49568","15777, 19357","15777, 19357")
Df<-tibble(Gender,CPTCodes)
Table2:
Code<-c(19328,19342,15777,49568,12345)
Value<-c(0.5,7,9,35,2)
Df2<-tibble(Code,Value)
And had previously asked this question about how to summarize the "values" from table 2 into a column in table 1, depending on how many codes were in the "Code" column of table 1. Turns out it was a duplicate of another question, but either way, the solutions there worked great! It did exactly what I asked.
Problem was that I didn't realize, buried deep down in the thousands of rows of Table 2, were some duplicate codes. I.e. table 2 really looked like this:
Code<-c(19357,19342,15777,49568,12345,15777,19357)
Modifier<-c("","","","","","a","a")
Value<-c(0.5,7,9,35,2,3,45)
Df2<-tibble(Code,Modifier,Value)
So when I use the suggested code:
Df %>% mutate(id = row_number()) %>% separate_rows(CPTCodes, sep = ", ", convert = TRUE) %>% left_join(Df2, by = c("CPTCodes" = "Code")) %>% group_by(id, Gender) %>% summarize(total = sum(Value, na.rm = TRUE))
It summarizes ALL of the codes in finds that match in Table2, and I really just want rows that dont have anything in the "modifier" column. Any ideas?
Lastly, the current code returns the summarized total in its own data frame, but it'd be cool if everything was still there from the original Table 1, and it just had an extra column with the new sum.