Run function for multiple weighted t tests on subsets of dataframe (R)

Question

I am running a function for multiple weighted t-tests on different subsets of a dataframe. My function is essentially the following:


library(weights)

group_list <- list(unique(df$group))

t_tests <- for (g in group_list){wtd.t.test(x=df[df$group == g,]$var2[df[df$group == g,]$var1=="A"],y=df[df$group == g,]$var2[df[df$group == g,]$var1=="B"],
weight=df[df$group == g,]$weight[df[df$group == g,]$var1=="A"],weighty=df[df$group == g,]$weight[df[df$group == g,]$var1=="B"],samedata=FALSE)}

Where var2 is the outcome variable of interest. I want to test the significance of the difference between means of var1 = "A" and var1 = "B", and perform this for each subset of the data for the different value of the variable group.

I used the above code, but the error is Error in wtd.t.test(x = df[df$group == g, : object 'out' not found Have I improperly structured the function? How can I make this weighted t test run for every subset of the dataframe?

UPDATE: New approach using nested tibbles as suggested

My new approach is the following:

library(weights)
library(tidyverse)

df %>% 
  nest(-group) %>% 
  mutate(fit = map(data, ~ wtd.t.test(x=.%>%filter(var1 == "A")$var2,y=.%>% filter(var1 == "B")$var2,
weight=.%>% filter(var1 == "A")$weight,weighty=.%>% filter(var1 == "B")$weight,samedata=FALSE)),
         results = map(fit, glance)) %>% 
  unnest(results)

The new error message is:


Error in `mutate()`:
ℹ In argument: `fit = map(...)`.
Caused by error in `map()`:
ℹ In index: 1.
Caused by error in `weight / mean(weight, na.rm = TRUE)`:
! non-numeric argument to binary operator
Backtrace:
  1. ... %>% unnest(results)
 10. purrr::map(...)
 11. purrr:::map_("list", .x, .f, ..., .progress = .progress)
 15. .f(.x[[i]], ...)
 16. weights::wtd.t.test(...)

All of my variables are numeric, other than Var1, which is not used in calculations, so I am unclear why this error message emerges. Any suggestions would be much appreciated.

If I reformat the code as the following:

df %>% 
  nest(-country) %>% 
  mutate(fit = map(data, ~ wtd.t.test(x=filter(.,var1 == "A")$var2,y=filter(.,var1 == "B")$var2,
weight=filter(.,var1 == "A")$weight,weighty=filter(.,var1 == "B")$weight,samedata=FALSE)),
         results = map(fit, glance)) %>% 
  unnest(results)

Now the error becomes:

Error in `mutate()`:
ℹ In argument: `fit = map(...)`.
Caused by error in `map()`:
ℹ In index: 1.
Caused by error in `wtd.t.test()`:
! object 'out' not found
Backtrace:
  1. ... %>% unnest(results)
 10. purrr::map(...)
 11. purrr:::map_("list", .x, .f, ..., .progress = .progress)
 15. .f(.x[[i]], ...)
 16. weights::wtd.t.test(...)

UPDATE 2

Here is the new code updated with a reproducible example:


library(weights)
library(tidyverse)

mtcars %>% 
  nest(-cyl) %>% 
  mutate(fit = map(data, ~ wtd.t.test(x=.%>%filter(gear == 3)$disp,y=.%>% filter(gear = 4)$disp,
weight=.%>% filter(gear == 3)$wt,weighty=.%>% filter(gear == 4)$wt,samedata=FALSE)),
         results = map(fit, glance)) %>% 
  unnest(results)

and reformatted:


mtcars %>% 
  nest(-cyl) %>% 
  mutate(fit = map(data, ~ wtd.t.test(x=filter(.,gear == 3)$disp,y=filter(.,gear == 4)$disp,
weight=filter(.,gear == 3)$weight,weighty=filter(.,gear == 4)$weight,samedata=FALSE)),
         results = map(fit, glance)) %>% 
  unnest(results)

I'd suggest reading [Running a model on separate groups](https://drsimonj.svbtle.com/running-a-model-on-separate-groups). It's a much tidier, cleaner approach using nested tibbles, and can be adapted to running t-tests. For more help you'll need to [make your question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including some example data. — neilfws, May 24 '23 at 04:38
Many thanks for this suggestion @neilfws I have adapted my approach and question based on your idea. It is indeed much cleaner, and seems closer to working, but I am still getting error messages (see update above). Any suggestions on how to fix this? — flâneur, May 25 '23 at 16:58
Also updated with a reproducible example from the mtcars dataset. — flâneur, May 25 '23 at 17:21
Not sure, but maybe the issue is that you're comparing x and y vectors of different lengths? — neilfws, May 26 '23 at 01:19
Thanks for the idea. That shouldn't be the case, because two-sample t-tests can have different sample sizes. — flâneur, May 26 '23 at 01:40

score 0 · Accepted Answer · answered May 26 '23 at 03:22

For those interested, a solution (using the mtcars dataset as example data) is the following:

library(tidyverse)
library(weights)
df_list <- split(mtcars, mtcars$cyl)
multiple_wt_ttest <- function(df) {ttest = wtd.t.test(x=subset(df, gear == 3)$disp,y=subset(df, gear == 4)$disp,
weight=subset(df, gear == 3)$wt,weighty=subset(df, gear == 4)$wt,samedata=FALSE)
 out <<- ttest[2]}

data_store <- do.call(rbind, sapply(df_list,multiple_wt_ttest))

Which yields a dataframe with the t-test test statistics for each subset of the data for each level of cyl.

Run function for multiple weighted t tests on subsets of dataframe (R)

1 Answers1