0

I need to generate reports for different groups and compute mean values for different subgroups. However, the number of cases it very small, so that t.test() throws an error.

Here's an example:

library(tidyverse)
n <- 10

test_df <- data.frame(var = sample(1:5, size=n, replace=TRUE),
                      grp = sample(c("A", "B", "C"), size = n, replace=TRUE))

# Throws an error when the numbers are too small:
#
# test_df %>%
#   group_by(grp) %>%
#   summarise(mean = mean(var, na.rm=TRUE),
#             n = n(),
#             uci = t.test(var, conf.level = .95)$conf.int[[2]],
#             lci = t.test(var, conf.level = .95)$conf.int[[1]])


# Try to avoid the error by checking for sd(var) == 0

test_df %>%
  group_by(grp) %>%
  summarise(mean = mean(var, na.rm=TRUE),
            n = n(),
            uci = if(sd(var) == 0) NA else t.test(var, conf.level = .95)$conf.int[[2]],
            lci = if(sd(var) == 0) NA else t.test(var, conf.level = .95)$conf.int[[1]])

I was trying to check for sd(var) == 0 in order to prevent the error to occur. However this does not solve the problem completely (if you execute the code many times it will still throw an error). What conditions do I have to check for?

D. Studer
  • 1,711
  • 1
  • 16
  • 35
  • You should a) reconsider your statistical methodology (ever heard of alpha error inflation in multiple testing?) and b) [use `tryCatch`](https://stackoverflow.com/a/12195574/1412059). – Roland Aug 24 '23 at 08:22
  • Could you please make an example using tryCatch? – D. Studer Aug 24 '23 at 08:26

1 Answers1

1

You should use tryCatch for catching and handling errors. Adapted to your code:

test_df %>%
  group_by(grp) %>%
  summarise(mean = mean(var, na.rm=TRUE),
            n = n(),
            uci = tryCatch(t.test(var, conf.level = .95)$conf.int[[2]], 
                           error = \(e) {warning(paste(format(e), collapse = " in ")); NA}),
            lci = tryCatch(t.test(var, conf.level = .95)$conf.int[[1]], 
                           error = \(e) {warning(paste(format(e), collapse = " in ")); NA})
  )
    
## A tibble: 3 × 5
#  grp    mean     n   uci   lci
#  <chr> <dbl> <int> <dbl> <dbl>
#1 A         3     3  7.30 -1.30
#2 B         5     2 NA    NA   
#3 C         3     5  4.52  1.48
#Warning message:
#There were 2 warnings in `summarise()`.
#The first warning was:
#ℹ In argument: `uci = tryCatch(...)`.
#ℹ In group 2: `grp = "B"`.
#Caused by warning in `value[[3L]]()`:
#! data are essentially constant in t.test.default(var, conf.level = 0.95)
#ℹ Run dplyr::last_dplyr_warnings() to see the 1 remaining warning. 

I prefer throwing the warnings in the error catchers but that's optional, of course.

And I believe the confidence intervals might be too narrow because of alpha-error inflation.

Roland
  • 127,288
  • 10
  • 191
  • 288