Robust Independent T-test

Question

This is my first time asking a question, so I apologize for any formatting issues or anything that makes this difficult to answer. Please let me know what I need to add to be able to the answer question.

I'm attempting to compare differences between 2 unequal group sizes (one ~ 97 the other ~ 714). The reason for the large discrepancy is I am looking at a program done by one class to see if it is significantly different than what has occurred in previous classes. I've been reading about robust stats recently and decided to use a yuen bootstrap in R-Studio from the WRS2 package for a more valid comparison, especially with the difference in sample size.

My formula is

yuenbt(DataExample$PT500 ~ DataExample3$ClassPT500, tr = 0.2, nboot = 599, side = TRUE)

and it returns

Call:
yuenbt(formula = DataExample$PT500 ~ DataExample$ClassPT500,
tr = 0.2, nboot = 599, side = TRUE)

Test statistic: NA (df = NA), p-value = 0

Trimmed mean difference: -65
95 percent confidence interval:
NA NA

The NA's return on other variables that I've tried out as well, or in some cases the confidence interval will state INF. Any ideas why this is happening (such a big difference in sample size?) and suggestions on what the next best step would be are greatly appreciated.

Here is a sample of data:

structure(list(PrePT500 = c(74, 105, 121, 128), PostPT500 = c(191, 
264, 327, 314), PT500 = c(117, 159, 206, 186), PrePullups = c(0, 
NA, NA, 2), PostPullups = c(3, NA, NA, 3), Pullups = c(3, NA, 
NA, 1), PreSitups = c(46, 40, 25, 33), PostSitups = c(41, 61, 
39, 49), Situps = c(-5, 21, 14, 16), PreMC = c(8, 16, 29, 19), 
    PostMC = c(41, 45, 60, 60), MC = c(33, 29, 31, 41), PrePushups = c(20, 
    16, 28, 30), PostPushups = c(40, 47, 50, 50), Pushups = c(20, 
    31, 22, 20), Pre1.5 = c(1048, 917, 902, 905), Post1.5 = c(846, 
    748, 696, 760), X1.5 = c(-202, -169, -206, -145), Pre220 = c(43, 
    50, 41, 45), Post220 = c(39, 40, 32, 34), X220 = c(-4, -10, 
    -9, -11), PreAgility = c(20.96, NA, 21.1, 19.88), PostAgility = c(19.69, 
    NA, 18.8, 20.79), Agility = c(-1.27, NA, -2.3, 0.91), PreBD = c(6.17, 
    7.82, 5.08, 7), PostBD = c(5, 4.87, 4.68, 6.2), BD = c(-1.17, 
    -2.95, -0.4, -0.8), PreCL = c(7.05, 13.6, 14.4, 8.8), PostCL = c(8.1, 
    8.9, 8.27, 7.6), CL = c(1.05, -4.7, -6.13, -1.2), PreSW = c(10.2, 
    NA, 20.34, 8), PostSW = c(11.4, NA, 9.3, 7.4), SW = c(1.2, 
    NA, -11.04, -0.6), Pre500 = c(115, 128, 107, 114), Post500 = c(105, 
    112, 93, 99), X500 = c(-10, -16, -14, -15), PreTotal = c(446, 
    91, 255, NA), PostTotal = c(493, 439, 503, NA), Total = c(47, 
    348, 248, NA), ClassPrePT500 = c(338, 213, 215, 243), ClassPostPT500 = c(430, 
    396, 333, 314), ClassPT500 = c(92, 183, 118, 71), ClassPrePullups = c(6, 
    5, 2, 0), ClassPostPullups = c(13, 7, 15, 0), ClassPullups = c(7, 
    2, 13, 0), ClassPreSitups = c(59, 42, 45, 53), ClassPostSitups = c(75, 
    70, 51, 53), ClassSitups = c(16, 28, 6, 0), ClassPreMC = c(60, 
    43, 31, 48), ClassPostMC = c(60, 60, 31, 60), ClassMC = c(0, 
    17, 0, 12), ClassPrePushups = c(50, 37, 26, 30), ClassPostPushups = c(50, 
    50, 47, 34), ClassPushups = c(0, 13, 21, 4), ClassPre1.5 = c(803, 
    810, 803, 741), ClassPost1.5 = c(700, 690, 664, 661), Class1.5 = c(-103, 
    -120, -139, -80), ClassPre220 = c(32, 41, 31, 40), ClassPost220 = c(31, 
    33, 30, 37), Class220 = c(-1, -8, -1, -3), ClassPreAgility = c(19, 
    23, 18, 22.1), ClassPostAgility = c(16.4, 18, 16.5, 20.3), 
    ClassAgility = c(-2.6, -5, -1.5, -1.8), ClassPreBD = c(6.4, 
    8.5, 5.8, 11.2), ClassPostBD = c(5.3, 5.8, 5.5, 7.5), ClassBD = c(-1.1, 
    -2.7, -0.3, -3.7), ClassPreCL = c(7.8, 9.3, 7.3, 9.6), ClassPostCL = c(7.6, 
    7.4, 7.4, 9.2), ClassCL = c(-0.2, -1.9, 0.100000000000001, 
    -0.4), ClassPreSW = c(8.5, 8.4, 7.7, NA), ClassPostSW = c(7.8, 
    8.1, 7.6, 8), ClassSW = c(-0.7, -0.300000000000001, -0.100000000000001, 
    NA), ClassPre500 = c(102, 104, 100, 108), ClassPost500 = c(94, 
    88, 98, 101), Class500 = c(-8, -16, -2, -7), ClassPreTotal = c(495, 
    418, 528, 264), ClassPostTotal = c(561, 539, 562, 482), ClassTotal = c(66, 
    121, 34, 218)), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"))

Thank you in advance for any help.

Hi Tibial, welcome to Stack Overflow. It will be much easier to help if you provide at least a sample of your data for each group with `dput(DataExample[1:10,])`. You can edit your question and paste the output. You can surround it with three backticks (```) for better formatting. See [How to make a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more info. — Ian Campbell, Apr 14 '20 at 02:12
@Ian Campbell, updated the post as per your request. Let me know if there is anything else! — TibialCuriosity, Apr 14 '20 at 08:10
when you do yuenbt(DataExample$PT500 ~ DataExample3$ClassPT500 ...) the variable on the right side needs to be a factor, something that groups your dependent variable — StupidWolf, Apr 18 '20 at 10:04
in the example dataset you provided, DataExample3$ClassPT500 looks continuous and you cannot test that. Did you test the wrong column? — StupidWolf, Apr 18 '20 at 10:04
@StupidWolf both of your responses solved this for me! I did not pick up that it needed to be a factor on the right, thank you for your help! — TibialCuriosity, Apr 21 '20 at 03:08

score -1 · Answer 1 · edited Oct 07 '20 at 20:22

The R function yuenbt(x, y, tr=0.2, alpha=0.05, nboot=599, side=F) computes a 1 − α confidence interval for μt 1 − μt 2 using the bootstrap-t method, where the default amount of trimming (tr) is 0.2, the default value for α is 0.05, and the default value for nboot (B) is 599. So far, simulations suggest that in terms of probability coverage, there is little or no advantage to using B > 599 when α = 0.05. However, there is no recommended choice for B when α < 0.05 simply because little is known about how the bootstrap-t performs for this special case. Finally, the default value for side is FALSE, indicating that the equal-tailed two-sided confidence interval is to be used. Using side=TRUE results in the symmetric two-sided confidence interval.

Try:

yuenbt(DataExample$PT500, DataExample3$ClassPT500, tr = 0.2, nboot = 599, side = TRUE)

Hi @supcumps, OP gets an NA because a continuous variable was wrongly used as a grouping factor. See comments. How does setting the parameter solve the problem? — StupidWolf, Oct 07 '20 at 21:45
Hi change the mode of comparison from ‘~’ to ‘,’. Seems to work comparing two groups — supcumps, Oct 08 '20 at 21:12

Robust Independent T-test

1 Answers1