I'm here again to ask the colleagues to help me to improve this code. Since this forum is so receptive, I would like to hear your thoughts about this situation!
The question is pretty simple but needs knowledge about looping / for() / by() etc and I'm almost autodidact when programming in R:
I'm working on a dataset where I have estimates for proprtions and I want to compute the lower and upper boundaries for the confidence interval. The work consists of transforming this:
to this:
There is a package that computes that for me
library(PropCIs)
The solution was intuitive (for me).
1) Get every point estimate,
2) compute its lower and upper boundaries and
3) move to the next row and
4) Return to the first step.
My workaround was based on this post here and is described below step by step. However, I imagine this solution is too slow, naive or works like I'm talking with a strange accent to any native data scientist. Then, I'm wondering if tidyverse could help me to improve that.
library(PropCIs)
library(tidyverse)
set.seed(123)
ds <- data.frame(estimate = runif(15, min=0, max=1),
sample = sample(x = 10:15, 15, replace = T))
ds <- ds %>% mutate(lower = '')
#looping
for(i in 1:nrow(ds)) {
ds$lower[i] <- blakerci(ds$sample[i], 3449, conf.level=0.95)
}
#row to columns
ds <- separate(data = ds, col = lower, into = c("lower", "upper"), sep = ",")
#replace strings
ds <- ds %>% mutate(lower = gsub("c(", "", lower, fixed = TRUE),
upper = gsub(")", "", upper, fixed = TRUE))
#Transform to numeric
ds <- ds %>% mutate_at(vars(lower, upper), funs(as.numeric(.)))
As always, thanks much for all support!
Please take into consideration this post has a reproducible script and can help other people! =)