0

I have two tables:

Table 1-

Gender<-c("M","F","M","M","F")
CPTCodes<-c("15777, 19328, 19342, 19366, 19370, 19371, 19380","15777, 19357","19367, 49568","15777, 19357","15777, 19357")
Df<-tibble(Gender,CPTCodes)

Table 2-

Code<-c(19328,19342,15777,49568,12345)
Value<-c(0.5,7,9,35,2)
Df2<-tibble(Code,Value)

I would like to create a third column in table 1 that has the sum of the value (from table 2) of all of the codes from the "CPTCodes" column in table 1. So in theory I would be first separating the numbers in the CPTCodes column (as some rows have more codes than others), converting to numeric, then searching for that particular code in table 2's "Code" column, grabbing the corresponding value in table 2 and summing them to put back in table 1. For instance, for the very first row of table 1 I would be finding the corresponding value of 15777, adding it to the corresponding value of 19328, added to value of 19342.. etc.. etc..

I had thought about first using tidyverse's "separate" to split the cptcodes column into multiple columns with one code in each column, but not every row has the same number of codes and often there are 10+, is this the best way?

If this has already been answered somewhere please point me in the right direction.

Joe Crozier
  • 944
  • 8
  • 20
  • Separating the string into entries is good. `tidyr::separate` is clunky with many and variable numbers of separations, but you can turn it into a `list` column and then use `tidry::unnest`. (You actually want them to be separate rows, not separate columns, so this is more direct.) [Here's a FAQ on doing that](https://stackoverflow.com/a/43431847/903061). I'd suggest closing as a dupe of that. – Gregor Thomas Jul 29 '19 at 19:19
  • Something like `mutate(Df, CPTCodes = strsplit(CPTCodes, split = ", ")) %>% unnest` then a simple join, group, summarize. – Gregor Thomas Jul 29 '19 at 19:20
  • ...but looking at the dupe, apparently `tidyr::separate_rows` exists, which gets you there in one step. – Gregor Thomas Jul 29 '19 at 19:22
  • Pardon the ignorant question, but why would I want separate rows and not columns? @Gregor – Joe Crozier Jul 29 '19 at 19:23
  • 1
    Because having them in different rows lets you `%>% left_join(Df) %>% group_by(ID column) %>% summarize(total = sum(Value))`. But joining, grouping, summing, etc. are much easier to do based on one or two columns rather than on a variable number of columns. – Gregor Thomas Jul 29 '19 at 20:02
  • 1
    A full answer would be `Df %>% mutate(id = row_number()) %>% separate_rows(CPTCodes, sep = ", ", convert = TRUE) %>% left_join(Df2, by = c("CPTCodes" = "Code")) %>% group_by(id, Gender) %>% summarize(total = sum(Value, na.rm = TRUE))` – Gregor Thomas Jul 29 '19 at 20:07

0 Answers0