This question aims to receive feedback to make a function more efficient. Apologies for a long, case specific post.
I created a function that calculates percentages in estimates of the American Community Survey (ACS). Because estimates in the ACS have margins of error, calculating percentages (e.g. % of the total population being below 17 years old) requires the recalculation of the error that results from dividing the estimate of both variables (population below 17 / total population).
So to calculate the new margin of error for a proportion calculated as p = estimate_a/estimate_b
, the formula to use is MOE(p) = (1/estimate_b)*sqrt((MOE_b^2)-(p^2*MOE_a^2))
. If the value inside of the square root was negative, then the substraction should be changed to a sum, with the formula becoming MOE(p) = (1/estimate_b)*sqrt((MOE_b^2)+(p^2*MOE_a^2))
. If the result of p = estimate_a/estimate_b
is 1, the documentation suggest calculating MOE using another formula: MOE(p) = MOE_a/estimate_b
To make these calculations, I created a function that takes a data frame with estimates and their MOEs, calculates the proportion between two specified variables, and writes two new columns in the original dataframe - one with the proportion, and another one with its margin of error. The function loops through the rows of the data frame carrying out if-else checks to determine what formula to apply, including skipping rows that might have NA values. The original data on which I apply this function is considerably long - ~250000 rows, and the structure of this function makes it go very slowly. Hence, the question is whether there are ways to improve the quality of this code to improve its speed. The function and dummy data are provided below:
percent_calculator <- function(DF, A_e, B_e, A_se, B_se, New_fn){
# arguments legend >> DF = data frame; A_e = estimate_a (string of the fieldname); B_e = estimate_b (string of the fieldname);
# A_se = MOE_a (string of the fieldname); B_se = MOE_b (string of the fieldname); New_fn = root for new fieldname in the data frame (string)
pb<- txtProgressBar(min = 0, max = nrow(DF), initial = 0) # progress bar initialization
for (i in 1:nrow(DF)){ # for loop that iterates through the rows of the DF
setTxtProgressBar(pb,i)
if(is.na(DF[[A_e]][i])==FALSE & is.na(DF[[B_e]][i])==FALSE){ # check if any of the estimates used to calculate the proportion is NA (if so, skip)
if (DF[[B_e]][i]!= 0){ # check if estimate_b is not 0, to avoid creating inf values from A_e/B-e
DF[[paste0(New_fn, "_e")]][i] <- (DF[[A_e]][i]/DF[[B_e]][i])
if(DF[[paste0(New_fn, "_e")]][i] == 1){ # check if P==1 to then use the appropiate formula for MOE
DF[[paste0(New_fn, "_se")]][i] <- (DF[[A_se]][i]/DF[[B_e]][i])
} else {
if((DF[[A_se]][i]^2)-(DF[[paste0(New_fn, "_e")]][i]^2)*(DF[[B_se]][i]^2)>= 0){ # check for the sign of the value inside of the square root
DF[[paste0(New_fn, "_se")]][i] <- (1/DF[[B_e]][i])*sqrt((DF[[A_se]][i]^2)-(DF[[paste0(New_fn, "_e")]][i]^2)*(DF[[B_se]][i]^2))
} else {
DF[[paste0(New_fn, "_se")]][i] <- (1/DF[[B_e]][i])*sqrt((DF[[A_se]][i]^2)+(DF[[paste0(New_fn, "_e")]][i]^2)*(DF[[B_se]][i]^2))
}
}
} else { # assign 0 value if B_e was 0
DF[[paste0(New_fn, "_e")]][i] <- 0
DF[[paste0(New_fn, "_se")]][i] <- 0
}
} else { # assign NA if any of the estimates was NA
DF[[paste0(New_fn, "_e")]][i] <- NA
DF[[paste0(New_fn, "_se")]][i] <- NA
}
DF[[paste0(New_fn, "_e")]][i] <- DF[[paste0(New_fn, "_e")]][i]*100 # switch from proportion to percentage in the estimate value
DF[[paste0(New_fn, "_se")]][i] <- DF[[paste0(New_fn, "_se")]][i]*100 # switch from proportion to percentage in the MOE value
}
return(DF)
}
Dummy <- structure(list(TotPop_e = c(636L, 1287L, 810L, 1218L, 2641L,
835L, 653L, 1903L, 705L, 570L, 2150L, 6013L, 1720L, 2555L, 1150L,
2224L, 1805L, 728L, 2098L, 3099L, 4194L, 1909L, 2401L, 1446L,
1345L, 1573L, 2037L, 634L, 1916L, 1522L, 592L, 831L, 577L, 2196L,
1482L, 1436L, 1668L, 3095L, 3677L, 2641L, 1285L, 932L, 2461L,
1609L, 1143L, 1617L, 1075L, 1280L, 838L, 1447L, 3941L, 2402L,
1130L, 851L, 10316L, 9576L, 2396L, 3484L, 5688L, 2200L, 1856L,
1441L, 2539L, 3056L, 1325L, 2454L, 2010L, 2340L, 1448L, 2435L,
2782L, 3633L, 1766L, 2564L, 1473L, 1214L, 1951L, 2561L, 4262L,
2576L, 4257L, 2314L, 2071L, 3182L, 1839L, 2214L, 1101L, 1898L,
790L, 867L, 1764L, 970L, 1320L, 2850L, 1019L, 1483L, 3720L, 2215L,
3581L, 3391L), TotPop_se = c(132.522796352584, 149.544072948328,
127.051671732523, 130.091185410334, 232.826747720365, 135.562310030395,
100.303951367781, 176.29179331307, 114.285714285714, 96.6565349544073,
339.817629179331, 438.297872340425, 245.592705167173, 324.012158054711,
333.130699088146, 224.924012158055, 321.580547112462, 169.604863221885,
175.075987841945, 469.908814589666, 375.075987841945, 411.550151975684,
378.115501519757, 235.258358662614, 241.337386018237, 291.793313069909,
337.386018237082, 138.601823708207, 145.896656534954, 193.920972644377,
135.562310030395, 117.325227963526, 244.984802431611, 318.54103343465,
207.90273556231, 200, 279.635258358663, 657.750759878419, 401.215805471125,
401.823708206687, 229.787234042553, 139.817629179331, 303.951367781155,
201.215805471125, 200, 252.887537993921, 356.838905775076, 241.945288753799,
238.297872340426, 267.477203647416, 320.9726443769, 255.31914893617,
178.115501519757, 116.109422492401, 891.793313069909, 766.565349544073,
255.31914893617, 463.22188449848, 448.632218844985, 367.781155015198,
269.300911854103, 261.398176291793, 286.93009118541, 446.808510638298,
224.316109422492, 212.158054711246, 233.434650455927, 304.559270516717,
356.231003039514, 275.379939209726, 330.699088145897, 368.996960486322,
248.024316109423, 310.030395136778, 153.799392097264, 243.768996960486,
265.65349544073, 337.386018237082, 436.474164133739, 359.270516717325,
344.072948328268, 196.960486322188, 231.003039513678, 356.231003039514,
212.158054711246, 348.328267477204, 206.079027355623, 240.729483282675,
196.352583586626, 141.033434650456, 215.80547112462, 127.659574468085,
248.024316109423, 589.057750759878, 231.61094224924, 486.93009118541,
605.471124620061, 713.06990881459, 488.753799392097, 382.370820668693
), Under17_se = c(35.8095476596307, 50.9877853224243, 50.0994474845873,
44.7376765786604, 113.994325548832, 59.7386237841673, 22.7862186188344,
95.1285234870203, 42.3093316505904, 35.4621507988699, 143.021311606928,
205.334390935311, 102.292167403598, 115.712493289527, 88.9617416652971,
98.0345650964952, 149.50823698925, 40.0016629212452, 86.7428425216985,
158.047696828218, 173.225615182675, 144.710221534209, 121.094774232467,
76.9999466678128, 88.9160360898593, 97.7665610480423, 133.02517642826,
30.4983051540691, 83.3625069421341, 75.7125713164268, 50.3826325227805,
37.5622898620679, 7.29483282674772, 122.185425418875, 83.4644035953588,
63.8384709681463, 99.5458131127046, 208.446825330589, 150.282359742524,
206.017151858922, 87.7761872483956, 56.194023821941, 120.701992909334,
50.6423479626955, 55.4225960853081, 93.2888100499867, 126.879946773287,
143.069104861932, 86.7747884744339, 79.4517480028886, 140.260959630942,
125.115775875384, 52.187662082273, 38.1819057688564, 365.828168907497,
380.635956883794, 135.735302000757, 213.321896356121, 198.507936644685,
126.535797699776, 141.516048792542, 114.238818548927, 117.737122860635,
165.644292987747, 71.238834852709, 93.0825940979755, 41.8438489710712,
97.0666682368976, 86.5060758100772, 92.8659724484427, 76.6536183156139,
192.822109819002, 101.83958502542, 139.341067042001, 55.3992539361667,
92.106793773051, 78.2330906844691, 115.177918141833, 207.546042154974,
139.609995160777, 153.568552211039, 73.5738128652025, 112.249861520572,
171.38868664475, 66.0687084216098, 181.939713349267, 28.4417934718288,
90.1132509720827, 57.4202669424023, 46.8440239496863, 80.4799857926917,
42.6875862955885, 81.3500156027725, 142.669475129055, 23.4653605661019,
191.159072511375, 159.615857998832, 191.592580855392, 184.123292172321,
125.375425911215), Under17_e = c(123, 284, 189, 228, 661, 180,
49, 500, 121, 115, 686, 1456, 385, 578, 302, 476, 738, 124, 527,
803, 1219, 459, 614, 218, 229, 422, 543, 69, 536, 306, 149, 80,
0, 520, 281, 270, 454, 669, 905, 978, 282, 178, 630, 187, 145,
367, 327, 577, 225, 246, 966, 629, 211, 65, 2857, 3051, 592,
1162, 1322, 464, 490, 264, 576, 617, 326, 695, 169, 381, 309,
476, 355, 915, 431, 869, 269, 358, 335, 650, 1443, 561, 900,
411, 759, 1265, 171, 833, 45, 255, 134, 144, 339, 203, 388, 413,
66, 416, 654, 565, 700, 362)), row.names = c(NA, 100L), class = "data.frame")
# example run to calculate pct of people below 17
Dummy <- percent_calculator(Dummy , "Under17_e", "TotPop_e", "Under17_se", "TotPop_se", "P_Bel17")