0

I am trying to create a dataframe in which the fields are two grouping variables and the mean, lower and upper confidence interval for a large number of measures, as described here:

How to pass more complex functions to summarise_if or mutate_if?

Here is a reprex of the dataset I am using (the true dataset has far more outcomes and rows):

      var1 var2 outcome1  outcome2 outcome3 outcome4 outcome5
1  0000999  214        0 0.0000000      0.0       10       82
2  0000999  214        0 0.0000000      0.0       11       88
3  0000999  214        0 0.0000000      0.0       10       90
4  0000999  214        0 0.0000000      0.0        5       45
5  0001382  214       13 0.7647059      1.5       12       36
6  0001382  214        0 0.0000000      0.0        7       46
7  0001382  214        8 1.0000000      1.5        7       51
8  0001382  214        0 0.0000000      0.0        0        2
9  0001382  214       16 1.0000000      1.5       15       55
10 0001950  214        7 0.8750000      1.5        6       43
11 0001950  214        0 0.0000000      0.0        8       59
12 0001950  214        0 0.0000000      0.0        3      105
13 0001950  214        0 0.0000000      0.0        1       65
14 0001957  214        0 0.0000000      0.0        3       30
15 0001957  214        0 0.0000000      0.0        8       57
16 0001957  214        5 0.7142857      1.5        4       78
17 0001957  214        0 0.0000000      0.0        3       36
18 0010610  214        0 0.0000000      0.0        1        8
19 0021726  215        0 0.0000000      0.0        8       67
20 0021726  215        0 0.0000000      0.0       15       87
21 0021726  215        0 0.0000000      0.0        8       79
22 0021726  215       14 0.7368421      3.0       12      106
23 0021726  215        0 0.0000000      0.0        0       11
24 0022908  215        0 0.0000000      0.0        1       41
25 0022908  215        0 0.0000000      0.0        0        0

The code I am using is:

 lci <- function(data) {
  as.numeric(ci(data)[2])
}

uci <- function(data) {
  as.numeric(ci(data)[3])
}   

data_agg <- data %>%
      group_by(var1, var2) %>%
      summarise_if(is.numeric, funs(mean, lci, uci)) %>%
      select(var1, var2, sort(current_vars())) #sorts into lci, mean, uci for each outcome var

which when printed gives

# A tibble: 7 x 17
# Groups:   var1 [7]
  var1   var2 outcome1_lci outcome1_mean outcome1_uci outcome2_lci outcome2_mean outcome2_uci outcome3_lci outcome3_mean
  <chr> <int>        <dbl>         <dbl>        <dbl>        <dbl>         <dbl>        <dbl>        <dbl>         <dbl>
1 0000~   214         0             0            0          0              0            0            0             0    
2 0001~   214        -1.71          7.4         16.5       -0.0851         0.553        1.19        -0.120         0.9  
3 0001~   214        -3.82          1.75         7.32      -0.477          0.219        0.915       -0.818         0.375
4 0001~   214        -2.73          1.25         5.23      -0.390          0.179        0.747       -0.818         0.375
5 0010~   214       NaN             0          NaN        NaN              0          NaN          NaN             0    
6 0021~   215        -4.97          2.8         10.6       -0.262          0.147        0.557       -1.07          0.6  
7 0022~   215         0             0            0          0              0            0            0             0    
# ... with 7 more variables: outcome3_uci <dbl>, outcome4_lci <dbl>, outcome4_mean <dbl>, outcome4_uci <dbl>,
#   outcome5_lci <dbl>, outcome5_mean <dbl>, outcome5_uci <dbl>

But the lower CIs are often below zero which for these data is physically impossible. So I tried to add in a conditional mutate to reset them to zero in this case e.g.

   data_agg <- data %>%
      group_by(var1, var2) %>%
      summarise_if(is.numeric, funs(mean, lci, uci)) %>%
      mutate_at(vars(contains("lci")), case_when(.<0 ~ 0, TRUE ~ .)) %>%
      select(var1, var2, sort(current_vars()))  #sorts into lci, mean, uci for each outcome var 

which returns:

Error: `TRUE ~ (.)` must be length 119 or one, not 17

Can anyone with more experience using scoped functions tell me what I'm doing wrong and what I should do instead?

Mel
  • 700
  • 6
  • 31

0 Answers0