0

I am trying to obtain some summary statistics in R using the dplyr package. Although weighted means are easy to get, I struggle with weighted SD. Typically I use the radiant.data package, but for this analysis I want to get the standard deviation by two grouping variables (time and gender).

Below is the code I am using for obtaining weighted means:

  group_by(time, gender) %>% 
  summarise(Mean=mean(x, na.rm=T, wt=weights))

Typically, I use the below code for weighted SD:

weighted.sd(df$x, df$weights, na.rm = T)

However, I cannot get that function to work within dplyr. Any ideas?

Additionally, is there any way to combine functions so that I can see two columns, one for weighted mean and one for weighted SD?

Thanks!

  • I think this question/answer is related, seems like the ```Hmisc``` package looks like it could help: https://stackoverflow.com/questions/10049402/calculating-weighted-mean-and-standard-deviation – Silentdevildoll Aug 02 '22 at 22:40

1 Answers1

1

You don't provide a reproducible example. However, grouping and summarizing with the weighted.sd() function from radiant.data seems to work fine in a dplyr pipeline:

library(tidyverse)

mtcars |>
  group_by(vs, cyl) |>
  summarize(w_mean = weighted.mean(x = mpg, wt = hp),
            w_sd = radiant.data::weighted.sd(x = mpg, wt = hp))
#> `summarise()` has grouped output by 'vs'. You can override using the `.groups`
#> argument.
#> # A tibble: 5 × 4
#> # Groups:   vs [2]
#>      vs   cyl w_mean  w_sd
#>   <dbl> <dbl>  <dbl> <dbl>
#> 1     0     4   26   0    
#> 2     0     6   20.6 0.646
#> 3     0     8   15.1 2.39 
#> 4     1     4   26.7 4.48 
#> 5     1     6   19.1 1.39

Created on 2022-08-02 by the reprex package (v2.0.1)

Note that radiant.data masks a lot of functions from the Tidyverse packages, which might cause other problems, so I just used the weighted.sd function directly rather than loading the package.

Kieran
  • 1,213
  • 10
  • 9
  • Thx for your precious input. Upvoted. However, I think you need to determine `wt = hp` as this is especially important in meta-analysis. I believe `wt should equal the mean of each study * No. of patients in each study.` Please let me know if you agree. Thx again – Mohamed Rahouma Jun 19 '23 at 12:58