0

I am new to R. I want to use Kruskal.test on my data frame which is having 50 rows and 76 columns. part of the data frame looks like this.

status  -1  Actinomyces Parascardovia   Corynebacterium Rothia  Bifidobacterium
KnownDiabeetic  0.313151767 0.000101245 0   0   0   0.055077453
KnownDiabeetic  0.549817041 0   0   0   0.000104548 0.018609514
KnownDiabeetic  0.176596177 0   0   0   0   0.036498577
KnownDiabeetic  0.100851409 0.000405433 0   0   0.000101358 0.04054328
KnownDiabeetic  0.073431511 0.000100867 0   0   0   0.070808957
KnownDiabeetic  0.335514698 0   0   0.000103875 0   0.089539836
KnownDiabeetic  0.307456901 0   0   0   0   0.007242681
KnownDiabeetic  0.090503247 0.000202922 0   0   0   0.002029221
KnownDiabeetic  0.401858774 0   0   0   0   0.00323265
KnownDiabeetic  0.256320658 0.000513875 0   0   0.002980473 0.028057554
KnownDiabeetic  0.02540743  0.00020245  0   0   0.000404899 0.120558761
KnownDiabeetic  0.191452468 0.001631987 0   0   0.000101999 0.374745002
KnownDiabeetic  0.230440533 0.002645233 0   0   0.001017397 0.274086886
KnownDiabeetic  0.328139322 0.001425807 0.000203687 0   0.000407373 0.319890009
KnownDiabeetic  0.026437135 0.000307409 0   0   0.00215186  0.22625269
KnownDiabeetic  0.273827688 0   0   0   0   0.009154715
NewlyDiagnosed  0.57150086  0   0   0   0.000101204 0.001012043
NewlyDiagnosed  0.565323565 0   0   0   0.00010175  0.089336589
NewlyDiagnosed  0.355542096 0   0   0   0   0.001312336
NewlyDiagnosed  0.446341716 0.000206975 0   0   0   0.050191452

I am trying to use kruskal.test iteratively to find out if there is statistically significant difference in the bacterial genera(column 2:76) against the grouping variable (status). I am using following R script for this

mydf<-Kruskal_genus_open_test 
kruskal.wallis.table <- data.frame()
for(i in seq(along=mydf[,1]))  {
    ## Run the KW test on on gene
    x <- as.vector(as.matrix(Kruskal_genus_open_test[i,]))
    ks.test <- kruskal.test(x, g=PCS_map$Description)
    ## Store the result in the data frame
    kruskal.wallis.table <- rbind(kruskal.wallis.table,
                                  data.frame(id=training.filtered.probe.names[i],
                                             p.value=ks.test$p.value
                                  ))
    ## Report number of genes tested
    verbose(paste("Kruskal-Wallis test for gene ", i, "/", 
                  training.filtered.probe.nb, "; p-value=", ks.test$p.value, sep=""))
}

But I my getting the error as

Error in kruskal.test.default(x, g = PCS_map$Description) : 'x' and 'g' must have the same length

Please help to sort out this issue.

Thank You,

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Can you make your [problem reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – Roman Luštrik Apr 12 '14 at 07:04

1 Answers1

2

If you only want to get the p-value for each test, the following should work just fine:

apply(mydf[,-1], 2, function(x) kruskal.test(x,mydf[,1])$p.value)
Benoit
  • 1,154
  • 8
  • 11