I have a dataset with several groups, where I want to calculate a median value for each group using dplyr. The data are weighted, and the weights need to be taken into account in calculating the median. I found the weighted.median
function from spatstat which seems to work fine. Consider the following simplified example:
require(spatstat, dplyr)
tst <- data.frame(group = rep(c(1:5), each = 100))
tst$val = runif(500) * tst$group
tst$wt = runif(500) * tst$val
tst %>%
group_by(group) %>%
summarise(weighted.median(val, wt))
# A tibble: 5 × 2
group `weighted.median(val, wt)`
<int> <dbl>
1 1 0.752
2 2 1.36
3 3 1.99
4 4 2.86
5 5 3.45
However, I would also like to add 95% confidence intervals to these values, and this has me stumped. Things I've considered:
- Spatstat also has a
weighted.var
function but there's no documentation, and it's not even clear to me whether this is variance around the median or mean. - This rcompanion post suggests various methods for calculating CIs around medians, but as far as I can tell none of them handle weights.
- This blog post suggests a function for calculating CIs and a median for weighted data, and is the closest I can find to what I need. However, it doesn't work with my dplyr groupings. I suppose I could write a loop to do this one group at a time and build the output data frame, but that seems cumbersome. I'm also not totally sure I understand the function in the post and slightly suspicious of its results- for instance, testing this out I get wider estimates for
alpha=0.1
than foralpha=0.05
, which seems backwards to me. Edit to add: upon further investigation, I think this function works as intended if I usealpha=0.95
for 95% CIs, rather thanalpha = 0.05
(at least, this returns values that feel intuitively about right). I can also make it work with dplyr by editing to return just a single moe value rather than a pair of high/low estimates. So this may be a good option- but I'm also considering others.
Is there an existing function in some library somewhere that can do what I want, or an otherwise straightforward way to implement this?