0

I obtained a distance.class table where samples where compared against each other to calculate an index. As a result, each value is duplicated as well as self comparisons occur. See example table below:

Sample1 Sample2 Sample3
Sample1 0 0.5 1
Sample2 0.5 0 0.8
Sample3 1 0.8 0

I already removed the self comparisons (sample1 vs sample1 etc.) But I do not know how to remove the redundant values (i. e. the upper half of the table). Desired output is a table like below, which I can then melt into a data.frame to build plots with. The samples are also of a specific type which I want to use to build the plots.

Sample1 Sample2 Sample3
Sample1
Sample2 0.5
Sample3 1 0.8
Var1 Var2 Type1 Type2 Value
Sample1 Sample2 a b 0.5
Sample1 Sample3 a a 1
Sample2 Sample3 b a 0.8
plicht
  • 123
  • 6
  • Can you share the step before your first result? You can probably modify your code to go directly to your desired output. – ktiu Jun 22 '21 at 15:36
  • Hi @ktiu This is the code I used: `library(phyloseq)` `library(rehsape2)` `library(dplyr)` `# calculate distances and coerce into matrix` `# this generates the first table in my starting post` `wu = phyloseq::distance(physeqObject, DistanceMeasure)` `wu.m = melt(as.matrix(wu)) %>%` `mutate_if(is.factor,as.character)` `# remove self-comparison` `wu.m = wu.m %>%` `filter(as.character(Var1) != as.character(Var2)) %>%` `mutate_if(is.factor, as.character)` – plicht Jun 23 '21 at 07:35
  • # get sample data from phyloseq object and combine with with distance matrix # this generates the last table from my starting post `sd = as.matrix(physeqObject@sam_data) %>% as.data.frame(sd) %>% select(sample, sampleType) %>% mutate_if(is.factor,as.character)` `colnames(sd) = c("Var1", "Type1")` `wu.sd = left_join(wu.m, sd, by = "Var1")` `colnames(sd) = c("Var2", "Type2")` `wu.sd = left_join(wu.sd, sd, by = "Var2")` – plicht Jun 23 '21 at 07:38
  • This is all very hard to reproduce. My tip is to look into `usedist::dist_make()`, but in order to tailor it to your code, it would be necessary that you provide us with a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#5963610) that we can copy and paste to better understand the issue and test possible solutions. – ktiu Jun 23 '21 at 08:21

1 Answers1

0

Thanks a lot, with usedist::dist_make() I was able to produce the intended solution.

After generating the class "dist" matrix calling phyloseq::distance(), I extracted the grouping variables from the phyloseq object with:

group2samp <- list() 
    group_list <- get_variable(sample_data(physeq), group) 
    for (groups in levels(group_list)) { # loop over the no. of group levels
        target_group <- which(group_list == groups) 
        group2samp[[ groups ]] <- sample_names(physeq)[target_group] 
    }  

Then I melted the resulting "group2samp" list and rearranged the order of the first column to match with my distance matrix:

library(reshape2)    
item_groups = melt(group2samp)

library(dplyr)
item_groups = arrange(item_groups, value)
# needed to reverse the column to match with my distance matrix
item_groups = item_groups[order(nrow(item_groups):1),]
item_groups = item_groups$L1 #extract only grouping variable

library(usedist)
distances = dist_groups(distance_matrix, item_groups)

distances
     Item1    Item2      Group1      Group2                          Label   Distance
1    sample9  sample8       Patch      Plaque       Between Patch and Plaque 0.94344640
2    sample9 sample70       Patch nonlesional  Between nonlesional and Patch 0.60253312
3    sample9 sample69       Patch       Patch                   Within Patch 0.62086228
plicht
  • 123
  • 6