0

I have an OTU table containing 888 taxa from 58 samples. Rows = OTUs and columns = samples, like this:

Otus <- data.frame(S_1 = c(0, 0, 1), S_2 = c(12, 0, 5), S_3 = c(0, 5, 3), row.names = c("OTU_1", "OTU_2", "OTU_3"))

I moved it from phyloseq over to DESeq2 using the phyloseq_to_deseq2 command after seeing this answer. Now that it's a DESeq2object, I have normalized the OTU table using the variance stabilizing transformation with the following command:

getVarianceStabilizedData(side_dds) #side_dds is a DESeq2 object

I would like to create a PCoA plot of my samples after normalization. Therefore I have to calculate a distance metric. However, my go-to method (Bray-Curtis) does not allow negative numbers as an entry and throws the error message: "results may be meaningless because data have negative entries in method “bray”."

In my search I found this post, but no answer has been provided as well as this post which in conclusion discourages adding a constant to the data.

Can anyone provide me with help?

EDIT: I have run out of comment space (s. below) but would like to provide others with the comment by Bastian Schiffthaler:

It would be good if you could upload a reproducible workflow, or your data. The error you are getting is because BC similarity should be calculated using a strictly positive value, but VST can produce negatives. You can assay(vsd) + min(assay(vsd)) since your data is now homoscedastic and (pseudo)-log2 transformed. This essentially just moves the zero offset. Just don't try to relate the value back to any "real-world" meaning. Since your data is zero inflated, also have a look at e.g.: bioconductor.org/packages/release/bioc/html/zinbwave.html for preprocessing instead of VST

1 Answers1

0

You cannot use Bray-Curtis with negative data. Bray-Curtis is a compositional dissimilarity index, and all entries of composition must be non-negative, and sums for each sampling unit must be positive. You cannot combine a variance-stabilizing transformation and Bray-Curtis, but you must make a choice. There is a huge number of dissimilarities that work with such transformed data, starting from Euclidean and Manhattan distances. If you are stabilizing variances (what ever that means), you may well use Euclidean distances that are closely related to variance and are the default choice in standard R function dist (and then you need not use PCoA, but you can directly use PCA on your data). Manhattan distances (also in dist) are similar in scope to Bray-Curtis. If you insist on Bray-Curtis, you must either skip pre-transformation or use a transformation that is compatible with Bray-Curtis. However, adding an arbitrary value to your transformed data is not such a compatible transformation. It lets you use Bray-Curtis, but such numbers do not define a composition and it is not adequate to use Bray-Curtis with such data.

Jari Oksanen
  • 3,287
  • 1
  • 11
  • 15
  • Thank you very much for your response! I have found a similar question yielding an answer: https://stackoverflow.com/questions/58883214/can-i-use-a-subset-of-results-from-deseq2-to-calculate-bray-curtis-dissimilarity (comment by Bastian Schiffthaler). They aregue adding a value (i.e. min(assay) / the lowest value of the data) is reasonable as the data is now homoscedastic and pseudo-log2 transformed, such that adding the value mentioned above "just moves the zero-offset". I have not thought about Euclidean or Manhatten distances and will take a look! – TorpCatBioInf Apr 22 '23 at 17:47
  • My answer was a bit (a big bit) puristic. Of course you can do anything and get a useful result. However, if you particularly want to do a "variance stabilzing" transformation, that is supposed to be a prelude to variance-based analysis. In ordination that would be PCA. You can do differently, and even get more meaningful result. Without seeing your data, we can't tell. – Jari Oksanen Apr 22 '23 at 20:30