Apply analysis on multiple columns in a dataset split by factor, including continuous and categorical data

Question

I am looking to apply t-tests on many columns in a dataset split by factor using R, I found a solution here: Apply t-test on many columns in a dataframe split by factor

This code is taken from the above question:

df <- read.table(text="Group   var1    var2    var3    var4    var5
1           3   5   7   3   7
1           3   7   5   9   6
1           5   2   6   7   6
1           9   5   7   0   8
1           2   4   5   7   8
1           2   3   1   6   4
2           4   2   7   6   5
2           0   8   3   7   5
2           1   2   3   5   9
2           1   5   3   8   0
2           2   6   9   0   7
2           3   6   7   8   8
2           10  6   3   8   0", header = TRUE)


t(sapply(df[-1], function(x) 
 unlist(t.test(x~df$Group)     [c("estimate","p.value","statistic","conf.int")])))

The result:

 estimate.mean in group 1 estimate.mean in group 2   p.value statistic.t conf.int1 conf.int2
var1                 4.000000                 3.000000 0.5635410   0.5955919 -2.696975  4.696975
var2                 4.333333                 5.000000 0.5592911  -0.6022411 -3.104788  1.771454
var3                 5.166667                 5.000000 0.9028444   0.1249164 -2.770103  3.103436
var4                 5.333333                 6.000000 0.7067827  -0.3869530 -4.497927  3.164593
var5                 6.500000                 4.857143 0.3053172   1.0925986 -1.803808  5.089522

This is exactly what I was after, however my dataset also includes categorical data, such as sex, and diagnosis (which includes multiple possibilities).

Is there a way to incorporate this into the above code? I am new to stats but I believe a chi square is used to test the difference between categorical data?

If this cannot be incorporated into the previous code, then a separate code to test the categorical data and produce a similar result would also be a great help.

Any help would be greatly appreciated.

Thanks, Tom

EDIT:

Thanks for your replies.

I am working with transplant data, I am looking to compare outcomes between on/off bypass at surgery. I am not quite sure the best way to show my data, I have copied this from a csv. file, hopefully it comes across okay.

Group,Age,Sex,Height,Weight,Diagnosis,Blood loss,Intubation time,Survival
On bypass,59,Male,165,102,Diagnosis 1,57,53,29
On bypass,44,Female,164,140,Diagnosis 1,114,15,35
On bypass,45,Male,165,119,Diagnosis 2,118,31,81
On bypass,26,Male,178,125,Diagnosis 1,171,36,31
On bypass,41,Female,177,105,Diagnosis 1,76,53,91
On bypass,43,Male,161,119,Diagnosis 3,97,38,63
Off bypass,53,Female,164,139,Diagnosis 1,125,49,51
Off bypass,26,Female,165,137,Diagnosis 3,29,7,86
Off bypass,30,Male,174,121,Diagnosis 1,174,43,100
Off bypass,59,Female,174,133,Diagnosis 1,40,16,43
Off bypass,63,Male,172,132,Diagnosis 2,32,46,10

I was planning to first ensure there is no significant difference between my two groups in terms of age, sex, height, weight and diagnosis.

I was then going to test the outcomes of the patients, including blood loss, intubation time and survival.

Could anyone advise the best test to use for this analysis? And if possible provide some help with the code to run this on R?

Thanks again, Tom

Perhaps you can post the sample data with those variables and what t.test/chisq.test you want run? — Gopala, May 07 '16 at 21:59
What are you attempting to analyze? Difference of group means/variances? Test of independence? Correlation? — Parfait, May 07 '16 at 23:41
I suspect you want to use a factorial ANOVA, but it's not clear what you are trying to test. This page might help you select a test, http://www.ats.ucla.edu/stat/stata/whatstat/whatstat.htm Be aware that multiple t-tests are generally to be avoided because they increase your per-comparison error rate. — abind-off, May 08 '16 at 12:01
Thank you for your comments, I have edited my original post and provided more information. Thanks, Tom. — tomclark, May 08 '16 at 20:40

score 0 · Answer 1 · answered May 09 '16 at 03:09

It's worth consulting a good text on matched subjects designs, but assuming you already have or will, this (and what you already have above) should help you do what you need to do in R:

 df <- read.table(text="Group, Age, Sex, Height, Weight, Diagnosis, Blood loss, Intubation time, Survival
                 On bypass,59,Male,165,102,Diagnosis 1,57,53,29
                 On bypass,44,Female,164,140,Diagnosis 1,114,15,35
                 On bypass,45,Male,165,119,Diagnosis 2,118,31,81
                 On bypass,26,Male,178,125,Diagnosis 1,171,36,31
                 On bypass,41,Female,177,105,Diagnosis 1,76,53,91
                 On bypass,43,Male,161,119,Diagnosis 3,97,38,63
                 Off bypass,53,Female,164,139,Diagnosis 1,125,49,51
                 Off bypass,26,Female,165,137,Diagnosis 3,29,7,86
                 Off bypass,30,Male,174,121,Diagnosis 1,174,43,100
                 Off bypass,59,Female,174,133,Diagnosis 1,40,16,43
                 Off bypass,63,Male,172,132,Diagnosis 2,32,46,10  ", header = TRUE, sep = ",")

library(dplyr)

# tally number of participants in each Group by Sex
tab <- tally(group_by(df, Group, Sex))
chisq.test(tab$n)  # test for Group differences by Sex

df <- group_by(df)

# do any of these variables differ by Group?
summary(manova(cbind(Age, Height, Weight) ~ Group, data = df))

# investigate all main effects
summary(aov(Survival ~ ., data = df))

# what about some main effects and interactions?
summary(aov(Survival ~ (Group+Age+Sex)^2, data = df))

Thank you very much, I have applied this to my data set and it has worked perfectly. Thanks, Tom. — tomclark, May 09 '16 at 21:26

Apply analysis on multiple columns in a dataset split by factor, including continuous and categorical data

1 Answers1