1

I am trying to transform a distance matrix from the qiime2 pathway using a centred log ratio in R. I uploaded the matrix directly from a qiime2 output file, and have converted to a data matrix and have checked that it is symmetrical.

structure(list(X1 = c(0, 0.2177609998, 0.2133361674, 0.1549136105, 
0.1400395799), X11 = c(0.2177609998, 0, 0.07805820645, 0.1418994689, 
0.1934668819), X12 = c(0.2133361674, 0.07805820645, 0, 0.1475390242, 
0.1857477705), X13 = c(0.1549136105, 0.1418994689, 0.1475390242, 
0, 0.1046740994), X14 = c(0.1400395799, 0.1934668819, 0.1857477705, 
0.1046740994, 0)), row.names = c("X1", "X11", "X12", "X13", "X14"
), class = "data.frame")
dm <- read.csv(file = "dm.csv", header = TRUE, row.names = 1)
isSymmetric(as.matrix(dm)) [1] TRUE
dm_matrix <- (data.matrix(dm, rownames.force = NA))

I then tried to transform the dataset using the dm_matrix_clr <- data.frame(clr(dm_matrix)) function from the 'compositions' package. This produced an output, however it was not symmetrical. From my understanding, the transformed distance matrix should be symmetric like the input matrix. The same numbers on opposing sides of the diagonal are being transformed, but are resulting in different outputs.

transformed distance matrix

Any help on how resolve this problem so I have a symmetrical transformed distance matrix would be greatly appreciated.

  • (a) Could you please make your example reproducible? You can use `dput()` to make a copy/pasteable version of a subset of your data, e.g. `dput(dm[1:5, 1:5])` for the first 5 rows and columns of `dm`. (b) Also please make your code reproducible. `clr()` is not in the base R packages, and when I google I first get [this `clr` package for curved linear regression](https://rdocumentation.org/packages/clr/versions/0.1.2/topics/clr). But that doesn't sound like the function you're using. What package is your `clr` from? – Gregor Thomas Jun 09 '23 at 14:12
  • I don't know anything about the compositions package, but [this `clr` definition](https://en.wikipedia.org/wiki/Compositional_data#Center_logratio_transform) looks simple enough. But I'm a bit confused about how this would apply to the data you show because of the 0s.... – Gregor Thomas Jun 09 '23 at 15:56
  • The definition has you taking the log of each entry and dividing by the geometric mean. The geometric mean of any set of numbers including 0 is 0, and you can't divide by 0. And `log(0)` results in `-Inf`. So I can't explain the result you are getting from the `clr()` function, but I'm suspicious that you shouldn't use it on your data because of the 0 diagonal. – Gregor Thomas Jun 09 '23 at 15:56
  • If it weren't for those issues, I'd recommend implementing the clr directly, `results = dm_matrix; results[] = log(dm_matrix / exp(mean(log(dm_matrix)))` – Gregor Thomas Jun 09 '23 at 15:59

0 Answers0