0

I have the following dataframe I'm calling "test" and I am trying to run a Bartlett's test and a Kruskal-Wallis test for each "metab" vs the "diagnosis"

> test

Index   tube.label  age gender  diagnosis   metab1  metab2  metab3  metab4  metab5  metab6
1            200    73  Male    Cancer         6    1.5         2      5       8    1.5
2            201    71  Male    Healthy        6    1.5         2    11.5     50    1.5
4            202    76  Male    Adenoma        2    1.5         2      5       8    1.5
7            203    58  Female  Cancer         2    1.5         2    1.5     2.5    1.5
9            204    73  Male    Cancer         2    1.5         2    1.5       8    1.5
12           205    72  Male    Healthy        6    1.5    17.8272  13.5    184.2   4.5
13           206    46  Female  Cancer     30.0530  1.5        2    21.2    16.6    4.5
14           207    38  Female  Healthy        6    1.5        2    12.494  31.59   1.5
15           208    60  Male    Cancer         6    1.5        2    13.2    53.2    4.5
16           209    72  Female  Cancer         6    1.5        2    1.5        8    1.5
17           210    72  Male    Adenoma        6    1.5        2    22.829  102.44  9.069
18           211    52  Male    Cancer         6    1.5        2    1.5        8    1.5
19           212    64  Male    Healthy        6    1.5        2    1.5        8    1.5
20           213    68  Male    Cancer         6    1.5        2    26.685  40.9    4.5
21           214    60  Male    Healthy    24.902   1.5   42.443    22.942  498.5   4.5
23           215    70  Female  Healthy         6   1.5        2    1.5     19.908  4.5
24           216    42  Female  Healthy         6   1.5        2    1.5      17.7   1.5
25           217    72  Male    Inflammation    6   1.5        2    1.5         8   1.5
26           218    71  Male    Healthy        51   1.5        2    41.062  182.2   11.340
27           219    51  Female  Inflammation    2   1.5        2    1.5         8   1.5

I can run them individually and it gives me the proper value:

bartlett.test(metab1 ~ diagnosis, data = test)

    Bartlett test of homogeneity of variances

data:  metab1 by diagnosis
Bartlett's K-squared = 5.1526, df = 3, p-value = 0.161
kruskal.test(metab1 ~ diagnosis, data = test)

    Kruskal-Wallis rank sum test

data:  metab1 by diagnosis
Kruskal-Wallis chi-squared = 4.3475, df = 3, p-value = 0.2263

However when I try to run a for loop (I have more than 100 of them to run) I keep getting the following error:

Bartlett error:

testcols <- colnames(test[6:ncol(test)])
for (met in testcols){
  bartlett.test(met ~ diagnosis, data = test)
}

>Error in model.frame.default(formula = met ~ diagnosis, data = test) : 
  variable lengths differ (found for 'diagnosis')

Kruskal-Wallis error:

for(met in testcols){
  kruskal.test(met ~ diagnosis,data = test)
}

>Error in model.frame.default(formula = met ~ diagnosis, data = test) : 
  variable lengths differ (found for 'diagnosis')

Should I be using something else? Thank you for the help!

vanish007
  • 323
  • 1
  • 10
  • Same technique that would work with `lm` should work for this function as well: https://stackoverflow.com/questions/46493011/r-loop-for-variable-names-to-run-linear-regression-model or https://stackoverflow.com/questions/37314006/how-to-use-loop-to-do-linear-regression-in-r – MrFlick Aug 07 '20 at 01:38

1 Answers1

3

Try to create formula to apply using reformulate :

cols <- names(test)[6:ncol(test)]

all_test <- lapply(cols, function(x) 
                    bartlett.test(reformulate("diagnosis", x), data = test))

You can do the same with kruskal.test.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 1
    You can use this to filter column names, just in case there are other columns after your metabs: `cols <- names(test)[grepl("^metab\\d.*$", names(test))]` – Paul Aug 07 '20 at 01:44
  • 1
    Yes, `cols <- grep("^metab\\d+", names(test), value = TRUE)` is probably shorter but I'll leave it to OP if they want to use it. – Ronak Shah Aug 07 '20 at 01:46
  • Thanks Ronak, I think I need to strengthen my understanding of when to use apply / lapply. Thanks again for the help. – vanish007 Aug 07 '20 at 15:12