1

I have a data frame of dim 15 x 555 of row-wise sample numbers against column-wise protein names. The last 3 columns of this data frame hold mapping information ie Treatment, Treatment_Time, and Month and are labeled as such.

While looping over the data frame column wise, I was hoping to conduct wilcoxon tests using wilcoxon.test and grabbing information from the df directly based on the mapping information.

rough ex:

pre_post <- vector()
for(i in names(df[,1:552])){  
    pre_post <- append(pre_post, wilcox.test(df[df$Treatment_Time %in% "Pre", i], df[df$Treatment_Time %in% "Post", i], na.action(na.omit))$p.value))}

The expectation is to have a vector with p values of wilxcoxon tests of length 552. If the wilcoxon test cannot be completed as anticipated, I hope to input and "NA".

This script works until a particular column doesn't have a value for a subset of data like Post and then delivers the aforementioned error. I've tried combating this with if else statements regarding the length of the subset of data in a column for a test but I can't get it to work.

for(i in names(df[,1:552])){
    if(length(df[df$Treatment_Time %in% "Pre", i])>1 & length(df[df$Treatment_Time %in% "Post", i])>1){
        pre_post <- append(pre_post, wilcox.test(df[df$Treatment_Time %in% "Pre", i], df[df$Treatment_Time %in% "Post", i], na.action(na.omit))$p.value)
    }
    else{     
    all_amb_all_delay <- append(all_amb_all_delay, "NA")
    }
}

Any help would be appreciated, thanks!

knd
  • 41
  • 4
  • When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. What do you want to happen in the case where such values don't exist? – MrFlick Mar 09 '18 at 19:54
  • Your code produces no results regardless since the implicit call to `print()` does not happen within a loop. You need to explicitly use the `print()` function or add the values to a vector to preserve the results. Look at the difference between `for (i in 1:5) 1 + i` and `for (i in 1:5) print(1 + i)`.Your code will also be simpler if you use the formula method for `wilcox.test()`. Read the manual pages: `?Control` and `?wilcox.test`. – dcarlson Mar 09 '18 at 20:17

1 Answers1

3

Consider tryCatch to return NA on filters with zero rows resulting in error of wilcox.test. Below uses sapply to return p values in a vector.

p_value_vector <- sapply(names(df[,1:552]), function(i) 
    tryCatch(
      wilcox.test(df[df$Treatment_Time %in% "Pre", i], 
                  df[df$Treatment_Time %in% "Post", i], 
                  na.action(na.omit))$p.value),
      warning = function(w) return(NA),
      error = function (e) return(NA)
    )
)
Parfait
  • 104,375
  • 17
  • 94
  • 125