confidence intervals for a tibble in wide format

Question

I have a large tibble, an example of which is shown below. It has seven predictors (V4 to V10) and nine outcomes (w1, w2, w3, mw, i1, i2, i3, mi, p2).
What I am trying to do is to create confidence intervals for the outcomes in columns 2 (w1) to 10 (p2)

vars    w1    w2      w3      mw      i1      i2      i3      mi      p2
V4  0.084   0.017   0.061   0.054   22.800  4.570   16.700  14.700  0.367
V5  0.032   0.085   0.039   0.052   8.840   23.100  10.700  14.200  0.367
V6  0.026   0.066   0.022   0.038   7.030   18.000  6.070   10.400  0.367
V7  0.097   0.020   0.066   0.061   26.300  5.420   18.100  16.600  0.367
V8  0.048   0.071   0.043   0.054   13.100  19.300  11.800  14.700  0.367
V9  0.018   0.111   0.020   0.050   4.800   30.300  5.440   13.500  0.367
V10 0.053   0.020   0.103   0.058   14.300  5.330   28.000  15.900  0.367
V4  0.084   0.017   0.060   0.054   22.400  4.420   16.200  14.300  0.373
V5  0.032   0.072   0.036   0.047   8.630   19.300  9.760   12.500  0.373
V6  0.030   0.076   0.023   0.043   8.080   20.500  6.070   11.500  0.373
V7  0.080   0.021   0.087   0.063   21.500  5.720   23.300  16.800  0.373
V8  0.053   0.090   0.034   0.059   14.100  24.000  9.110   15.700  0.373
V9  0.016   0.101   0.025   0.048   4.410   27.100  6.790   12.800  0.373
V10 0.060   0.022   0.100   0.061   16.000  5.950   26.800  16.300  0.373

When I group_by variables (vars) in dplyr and run quantiles on three of the outcomes (as a test), it does not give me what I'm looking for. Instead of giving me the confidence intervals for the three outcomes, it just gives me one confidence interval as seen below:

+   group_by(vars) %>% 
+   do(data.frame(t(quantile(c(.$w1, .$w2, .$w3), probs = c(0.025, 0.975)))))
# A tibble: 7 x 3
# Groups:   variables [7]
  variables  X2.5 X97.5
1 V10       0.0202 0.103 
2 V4        0.017  0.084 
3 V5        0.032  0.0834
4 V6        0.0221 0.0748
5 V7        0.0201 0.0958
6 V8        0.0351 0.0876
7 V9        0.0162 0.110

In short, what I'm looking for is something like the table below, where I get the confidence intervals for each outcome.

         w1                w2                    w3 
vars X2.5   X97.5   vars  X2.5  X97.5  vars X2.5    X97.5
V10 0.020   0.103   V10 0.020   0.103   V10 0.020   0.103
V4  0.017   0.084   V4  0.017   0.084   V4  0.017   0.084
V5  0.032   0.083   V5  0.032   0.083   V5  0.032   0.083
V6  0.022   0.075   V6  0.022   0.075   V6  0.022   0.075
V7  0.020   0.096   V7  0.020   0.096   V7  0.020   0.096
V8  0.035   0.088   V8  0.035   0.088   V8  0.035   0.088
V9  0.016   0.110   V9  0.016   0.110   V9  0.016   0.110

Any pointers in the right direction would be greatly appreciated. I've read on StackOverflow, but can't seem to find an answer that addresses what I want to do.

`aggregate(df1[-1], list(df1[[1]]), quantile, probs = c(0.025, 0.975))`. where `df1` is the name of your data set. — Rui Barradas, Oct 12 '19 at 17:32

score 2 · Accepted Answer · answered Oct 12 '19 at 17:41

Here are two ways.

Base R.

aggregate(df1[-1], list(df1[[1]]), quantile, probs = c(0.025, 0.975))

With the tidyverse.

library(dplyr)

df1 %>%
  group_by(vars) %>%
  mutate_at(vars(w1:p2), quantile, probs = c(0.025, 0.975))

Note that in the second way, the output format is different, the first quantile (0.025) is in the first rows and the second (0.975) in the last rows.

Data.

df1 <-
structure(list(vars = structure(c(2L, 3L, 4L, 
5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L), 
.Label = c("V10", "V4", "V5", "V6", "V7", "V8", 
"V9"), class = "factor"), w1 = c(0.084, 0.032, 
0.026, 0.097, 0.048, 0.018, 0.053, 0.084, 
0.032, 0.03, 0.08, 0.053, 0.016, 0.06), 
w2 = c(0.017, 0.085, 0.066, 0.02, 0.071, 0.111, 
0.02, 0.017, 0.072, 0.076, 0.021, 0.09, 0.101, 
0.022), w3 = c(0.061, 0.039, 0.022, 0.066, 
0.043, 0.02, 0.103, 0.06, 0.036, 0.023, 0.087, 
0.034, 0.025, 0.1), mw = c(0.054, 0.052, 0.038, 
0.061, 0.054, 0.05, 0.058, 0.054, 0.047, 0.043, 
0.063, 0.059, 0.048, 0.061), i1 = c(22.8, 8.84, 
7.03, 26.3, 13.1, 4.8, 14.3, 22.4, 8.63, 8.08, 
21.5, 14.1, 4.41, 16), i2 = c(4.57, 23.1, 18, 5.42, 
19.3, 30.3, 5.33, 4.42, 19.3, 20.5, 5.72, 24, 27.1, 
5.95), i3 = c(16.7, 10.7, 6.07, 18.1, 11.8, 5.44, 
28, 16.2, 9.76, 6.07, 23.3, 9.11, 6.79, 26.8), 
mi = c(14.7, 14.2, 10.4, 16.6, 14.7, 13.5, 15.9, 
14.3, 12.5, 11.5, 16.8, 15.7, 12.8, 16.3), 
p2 = c(0.367, 0.367, 0.367, 0.367, 0.367, 0.367, 
0.367, 0.373, 0.373, 0.373, 0.373, 0.373, 0.373, 
0.373)), class = "data.frame", 
row.names = c(NA, -14L))

The base R code works perfectly, and does what I needed. However, the dplyr code throws an error when I ran it. `df1 %>% group_by(vars) %>%mutate_at(vars(w1:p2), quantile, probs = c(0.025, 0.975)) Error: Column w1 must be length 1000 (the group size) or one, not 2` — GSA, Oct 13 '19 at 03:12

score 1 · Answer 2 · answered Oct 12 '19 at 18:06

Another possibility: melt/pivot to long format; compute summaries; then cast/pivot to wide format

library(tidyverse)
df2 <- (df1 
     %>% pivot_longer(-vars,"outcome","value") 
     %>% group_by(vars,outcome) 
     %>% summarise(lwr=quantile(value,0.025),upr=quantile(value,0.975))
)

df2 %>% pivot_wider(names_from=outcome,values_from=c(lwr,upr))

Unfortunately the columns aren't in the order you want; I can't think of a quick fix (you can select() with variables in the order you want ...

Your code works, it just estimates the lower quantiles and grouped them together before grouping the upper quantiles. As you suggested, I can use select to reorder them — GSA, Oct 13 '19 at 03:15

confidence intervals for a tibble in wide format

2 Answers2