0

I have just set up my dataframe from census data and a health administrative database in long format - to get counts/denoms by subgroups.

Here is a sample of what it looks like

YR  Region   DI   Cat         Sex    Age15cat    n       Age15Pop    weight          
2014  ON     Mat   Low         F       45-59     574     913430     0.5258552 
2014  ON     Mat   Low         F       60-74     2227    657160     0.3208399 
2014  ON     Mat   Low         F        75+      2999    300750     0.1533049
2014  ON     Mat   Low         M       45-59     585     865105     0.5258552
2014  ON     Mat   Low         M       60-74     2120    605290     0.3208399
2014  ON     Mat   Low         M        75+      3150    233935     0.1533049

YR could also be 2015-2017, DI is either Mat or Soc, Cat is either Low or High

Within each region, DI, and cat I am looking to calculate a crude and age-standardized rate with confidence intervals for females, males and total (understanding that I will probably have to reshape to get this).

YR    Region  DI    Cat  Sex Rate.t  Estimate     lci       uci  
2014    ON     Mat   Low   M  c.rate    0.0281   0.0257     0.0306 
2014    ON     Mat   Low   M  s.rate    0.0231   0.0210     0.0246
2014    ON     Mat   Low   F  c.rate    0.0221   0.0201     0.0232  
2014    ON     Mat   Low   F  s.rate    0.0241   0.0237     0.0246 
2014    ON     Mat   Low   T  c.rate    0.0251   0.0220     0.0267 
2014    ON     Mat   Low   T  s.rate    0.0255   0.0230     0.0260 


Then, I would like to calculate risk ratios and risk differences (with CIs) that are crude and age-standardized within each region and DI - Low divided by High and Low minus High for females, males and total and eventually get the following output (realizing I will probably have to reshape).

YR    Region   DI    Sex  Crude_Std  Measure  Estimate   lci    uci 
2014   ON      Mat    F    crude      RR        1.12     1.00   1.20 
2014   ON      Mat    M    crude      RR        1.89     1.22   3.00 
2014   ON      Mat    T    crude      RR        1.30     1.12   1.52 
2014   ON      Mat    F    crude      RD        0.23     0.21   0.24  
2014   ON      Mat    M    crude      RD        -0.01    -0.05  0.03
2014   ON      Mat    T    crude      RD         0.10     0.05  0.15 
2014   ON      Mat    F    std        RR         1.03     1.00  1.05 
2014   ON      Mat    M    std        RR         1.50     1.42  1.60  
 ….

I would like to find a way to do this in tidyverse without having to manually code functions (especially for calculating confidence intervals).

I've tried to play around with code from another post (see below)

df.copd2 %>% 
  group_by(YR,Region, DI, Cat, Sex) %>% 
  summarise(age_adjust = list(ageadjust.direct(count = n,
                                               pop = Age15Pop, stdpop = weight))) %>%
  mutate(age_adjust = map(age_adjust, as.data.frame.list))  %>% 
  unnest

This is working now. I will rerun a group_by without sex to get total and then bind_rows. It's still missing CIs for crude rate though - how do I get that?

I'm not sure how to get rate ratios and rate differences with CIs. I think the package dsrr would be the way to go, but I'm not sure how to do it with tidyverse.

Any help is greatly appreciated!

MegPophealth
  • 149
  • 1
  • 10
  • You may have a typo in the `group_by` function. The error would seem to indicate that one of the variable names is assume to be a function and not a column name. Did you use "Category" instead of "Cat"? – Dave2e Sep 18 '19 at 15:39
  • I had changed Category to be shorter for my example - I will change it to be Cat throughout so it's consistent.. This wasn't the problem, however – MegPophealth Sep 18 '19 at 16:06
  • I've updated - got some code working – MegPophealth Sep 18 '19 at 16:24

0 Answers0