2

I have the following data and want to calculate weighted-p-value. I reviewed dplyr summarise multiple columns using t.test. But my version should use weight. I can use Code2 to do this. But there are over 30 columns. How do I calculate weighted p-values efficiently?

Code 1

# A tibble: 877 x 5
   cat     population farms farmland weight
   <chr>        <dbl> <dbl>    <dbl>  <dbl>
 1 Treated       9.89  8.00     12.3  1    
 2 Control      10.3   7.81     12.1  0.714
 3 Control      10.2   8.04     12.4  0.156
 4 Control      10.3   7.97     12.1  0.340
 5 Control      10.9   8.87     12.7  2.85 
 6 Control      10.4   8.35     12.5  0.934
 7 Control      10.5   8.58     12.9  0.193
 8 Control      10.6   8.57     12.6  0.276
 9 Control      10.2   8.54     12.5  0.344
10 Control      10.5   8.76     12.6  0.625
# … with 867 more rows

Code 2

wtd.t.test(
  x = df$population[df$cat == "Treated"],
  y = df$population[df$cat == "Control"],
  weight = df$weight[df$cat == "Treated"],
  weighty = df$weight[df$cat == "Control"])$coefficients[3]
hrkshr
  • 119
  • 7

1 Answers1

2

We can use summarise with across

library(dplyr)
df %>%
   summarise(across(c(population:farmland),
   ~ weights::wtd.t.test(x = .[cat == 'Treated'],
                         y = .[cat == 'Control'], 
                         weight = weight[cat == 'Treated'],
                         weighty= weight[cat == 'Control'])$coefficients[3]))

Or using lapply/sapply

sapply(df[2:4], function(v)
         weights::wtd.t.test(x = v[df$cat == "Treated"],
                             y = v[df$cat == "Control"],
                             weight = df$weight[df$cat == "Treated"],
                   weighty = df$weight[df$cat == "Control"])$coefficients[3])
akrun
  • 874,273
  • 37
  • 540
  • 662