Improve coding -- using tidyverse to compute confidence intervals (for proportions) and add the results to individual rows

Question

I'm here again to ask the colleagues to help me to improve this code. Since this forum is so receptive, I would like to hear your thoughts about this situation!

The question is pretty simple but needs knowledge about looping / for() / by() etc and I'm almost autodidact when programming in R:

I'm working on a dataset where I have estimates for proprtions and I want to compute the lower and upper boundaries for the confidence interval. The work consists of transforming this:

to this:

There is a package that computes that for me

library(PropCIs)

The solution was intuitive (for me). 1) Get every point estimate,
2) compute its lower and upper boundaries and
3) move to the next row and
4) Return to the first step.

My workaround was based on this post here and is described below step by step. However, I imagine this solution is too slow, naive or works like I'm talking with a strange accent to any native data scientist. Then, I'm wondering if tidyverse could help me to improve that.

library(PropCIs)
library(tidyverse)
set.seed(123)
ds <- data.frame(estimate = runif(15, min=0, max=1), 
                 sample = sample(x = 10:15, 15, replace = T))

ds <- ds %>% mutate(lower = '')  

#looping
for(i in 1:nrow(ds)) { 
  ds$lower[i] <- blakerci(ds$sample[i], 3449, conf.level=0.95)
}

#row to columns
ds <- separate(data = ds, col = lower, into = c("lower", "upper"), sep = ",")

#replace strings
ds <- ds %>% mutate(lower = gsub("c(", "", lower, fixed = TRUE),
                  upper = gsub(")", "", upper, fixed = TRUE))

#Transform to numeric
ds <- ds %>% mutate_at(vars(lower, upper), funs(as.numeric(.)))

As always, thanks much for all support!

Please take into consideration this post has a reproducible script and can help other people! =)

This seems like it might be a better question for [codereview.se] There's not really a specific programming question here. Otherwise be more specific about what exactly needs to change. What exactly are you trying to fix? — MrFlick, Nov 06 '18 at 16:32
Thanks, @MrFlick. Actually, is not a programming question, but I question to help me to improve this code using tidyverse family if possible. I did not know about the Code Review section!!! I'll move this question to there ! — Luis, Nov 06 '18 at 20:41

score 1 · Accepted Answer · answered Nov 06 '18 at 16:58

I don't use PropCIs before, but it seems cool to think in probability. Here is the code in tidyverse style.

library(PropCIs)
library(tidyverse)
library(data.table)
#> 
#> 载入程辑包：'data.table'
#> The following objects are masked from 'package:dplyr':
#> 
#>     between, first, last
#> The following object is masked from 'package:purrr':
#> 
#>     transpose
set.seed(123)
data.table(
    # data.table is faster.
    estimate = runif(15, min=0, max=1)
    ,sample = sample(x = 10:15, 15, replace = T)
) %>% 
    mutate(
        ci = map(sample
                    ,~blakerci(., 3449, conf.level=0.95) %>% 
                    .$conf.int %>% 
                    .[1:2]
                    # get lower and upper
                    )
        # why n = 3449? Any assumption?
    ) %>% 
    mutate(
        lower = map_dbl(ci,~.[1])
        ,upper = map_dbl(ci,~.[2])
    ) %>% 
    select(-ci)
#>     estimate sample       lower       upper
#> 1  0.2875775     15 0.002476132 0.007123028
#> 2  0.7883051     11 0.001593144 0.005669411
#> 3  0.4089769     10 0.001481219 0.005245599
#> 4  0.8830174     11 0.001593144 0.005669411
#> 5  0.9404673     15 0.002476132 0.007123028
#> 6  0.0455565     15 0.002476132 0.007123028
#> 7  0.5281055     14 0.002350896 0.006691190
#> 8  0.8924190     13 0.002038424 0.006396837
#> 9  0.5514350     15 0.002476132 0.007123028
#> 10 0.4566147     13 0.002038424 0.006396837
#> 11 0.9568333     14 0.002350896 0.006691190
#> 12 0.4533342     13 0.002038424 0.006396837
#> 13 0.6775706     13 0.002038424 0.006396837
#> 14 0.5726334     11 0.001593144 0.005669411
#> 15 0.1029247     10 0.001481219 0.005245599

^{Created on 2018-11-07 by the reprex package (v0.2.1)}

devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  zh_CN.UTF-8                 
#>  tz       Asia/Shanghai               
#>  date     2018-11-07
#> Packages -----------------------------------------------------------------
#>  package    * version date       source         
#>  assertthat   0.2.0   2017-04-11 CRAN (R 3.5.0) 
#>  backports    1.1.2   2017-12-13 CRAN (R 3.5.0) 
#>  base       * 3.5.1   2018-07-05 local          
#>  bindr        0.1.1   2018-03-13 CRAN (R 3.5.0) 
#>  bindrcpp   * 0.2.2   2018-03-29 CRAN (R 3.5.0) 
#>  broom        0.5.0   2018-07-17 CRAN (R 3.5.0) 
#>  cellranger   1.1.0   2016-07-27 CRAN (R 3.5.0) 
#>  cli          1.0.0   2017-11-05 CRAN (R 3.5.0) 
#>  colorspace   1.3-2   2016-12-14 CRAN (R 3.5.0) 
#>  compiler     3.5.1   2018-07-05 local          
#>  crayon       1.3.4   2017-09-16 CRAN (R 3.5.0) 
#>  data.table * 1.11.8  2018-09-30 cran (@1.11.8) 
#>  datasets   * 3.5.1   2018-07-05 local          
#>  devtools     1.13.6  2018-06-27 CRAN (R 3.5.0) 
#>  digest       0.6.16  2018-08-22 cran (@0.6.16) 
#>  dplyr      * 0.7.6   2018-06-29 CRAN (R 3.5.1) 
#>  evaluate     0.11    2018-07-17 CRAN (R 3.5.0) 
#>  forcats    * 0.3.0   2018-02-19 CRAN (R 3.5.0) 
#>  ggplot2    * 3.0.0   2018-07-03 CRAN (R 3.5.0) 
#>  glue         1.3.0   2018-07-17 CRAN (R 3.5.0) 
#>  graphics   * 3.5.1   2018-07-05 local          
#>  grDevices  * 3.5.1   2018-07-05 local          
#>  grid         3.5.1   2018-07-05 local          
#>  gtable       0.2.0   2016-02-26 CRAN (R 3.5.0) 
#>  haven        1.1.2   2018-06-27 CRAN (R 3.5.0) 
#>  hms          0.4.2   2018-03-10 CRAN (R 3.5.0) 
#>  htmltools    0.3.6   2017-04-28 CRAN (R 3.5.0) 
#>  httr         1.3.1   2017-08-20 CRAN (R 3.5.0) 
#>  jsonlite     1.5     2017-06-01 CRAN (R 3.5.0) 
#>  knitr        1.20    2018-02-20 CRAN (R 3.5.0) 
#>  lattice      0.20-35 2017-03-25 CRAN (R 3.5.1) 
#>  lazyeval     0.2.1   2017-10-29 CRAN (R 3.5.0) 
#>  lubridate    1.7.4   2018-04-11 CRAN (R 3.5.0) 
#>  magrittr     1.5     2014-11-22 CRAN (R 3.5.0) 
#>  memoise      1.1.0   2017-04-21 CRAN (R 3.5.0) 
#>  methods    * 3.5.1   2018-07-05 local          
#>  modelr       0.1.2   2018-05-11 CRAN (R 3.5.0) 
#>  munsell      0.5.0   2018-06-12 CRAN (R 3.5.0) 
#>  nlme         3.1-137 2018-04-07 CRAN (R 3.5.1) 
#>  pillar       1.3.0   2018-07-14 CRAN (R 3.5.0) 
#>  pkgconfig    2.0.1   2017-03-21 CRAN (R 3.5.0) 
#>  plyr         1.8.4   2016-06-08 CRAN (R 3.5.0) 
#>  PropCIs    * 0.3-0   2018-02-23 CRAN (R 3.5.0) 
#>  purrr      * 0.2.5   2018-05-29 CRAN (R 3.5.0) 
#>  R6           2.3.0   2018-10-04 cran (@2.3.0)  
#>  Rcpp         0.12.19 2018-10-01 cran (@0.12.19)
#>  readr      * 1.1.1   2017-05-16 CRAN (R 3.5.0) 
#>  readxl       1.1.0   2018-04-20 CRAN (R 3.5.0) 
#>  rlang        0.2.2   2018-08-16 cran (@0.2.2)  
#>  rmarkdown    1.10    2018-06-11 CRAN (R 3.5.0) 
#>  rprojroot    1.3-2   2018-01-03 CRAN (R 3.5.0) 
#>  rvest        0.3.2   2016-06-17 CRAN (R 3.5.0) 
#>  scales       1.0.0   2018-08-09 CRAN (R 3.5.0) 
#>  stats      * 3.5.1   2018-07-05 local          
#>  stringi      1.2.4   2018-07-20 CRAN (R 3.5.0) 
#>  stringr    * 1.3.1   2018-05-10 CRAN (R 3.5.0) 
#>  tibble     * 1.4.2   2018-01-22 CRAN (R 3.5.0) 
#>  tidyr      * 0.8.1   2018-05-18 CRAN (R 3.5.0) 
#>  tidyselect   0.2.5   2018-10-11 cran (@0.2.5)  
#>  tidyverse  * 1.2.1   2017-11-14 CRAN (R 3.5.0) 
#>  tools        3.5.1   2018-07-05 local          
#>  utils      * 3.5.1   2018-07-05 local          
#>  withr        2.1.2   2018-03-15 CRAN (R 3.5.0) 
#>  xml2         1.2.0   2018-01-24 CRAN (R 3.5.0) 
#>  yaml         2.2.0   2018-07-25 CRAN (R 3.5.0)

Awesome. This solution was pretty elegant and much more understandable when compared to mine. 10/10!! (n = 3449 was a previous assumption I forgot to mention) — Luis, Nov 06 '18 at 20:49

Improve coding -- using tidyverse to compute confidence intervals (for proportions) and add the results to individual rows

1 Answers1