1

#The original data is in the excel document shared as dropbox link: [https://www.dropbox.com/scl/fi/soqzrov4bfqvwq2cp4gqk/Sums.xlsx?dl=0&rlkey=i07voviv6uun8pc503kyvdeee] #In this excel document the 1st column includes the characteristic number of the different genes. The other columns show if mutations of these genes (because of diseases) can cause symptoms in specific areas. Specifically, if the first gene has any effect on biliary tract, then the corresponding number in the column "abdomenBiliaryTractExists" should be >1. As it is 0, there is no effect. #Starting with these data, I used the "dist" function and specifically the "binary" method to measure the distance.

>newobject<-dist(Sums, method="binary", diag=FALSE, upper=FALSE,p=2)

#Then I use "hclust" & "cutree" functions.

>binary<-hclust(newobject)
>newbinary<- cutree(binary, h=0.35)

#I created a table where the first row shows the clusters, and the second row the number of genes it includes (the first cluster contains 1 gene, the second cluster contains 20 genes and so on!). >table(newbinary) 1 2 3 4 5 6
1 20 4 6 2 4

#then a new table of the prior table includes the total of clusters (2nd row) that include a specific number of genes(1st row). For example there are 700 clusters with only 1 gene, 527 clusters with 2 genes and so on.

> table(table(newbinary))
1    2   3   4   5   6   7   
700 527 218  89  56  26  21 

#I now need to find which for example are those 7 genes that are included in those 21 clusters (last column of prior table).

Marina
  • 11
  • 2
  • 1
    Hi and welcome to Stack Overflow. It would be easier for you to get help if you edited your question to add a [reproducible example](https://stackoverflow.com/a/5963610/13968222), so we can test possible solutions. – Andrea M Jul 23 '22 at 08:14
  • Hi and thank you. I hope my changes are clear and helpful – Marina Jul 24 '22 at 13:38
  • I was able to have a list with the genes included in each cluster: >df3<-data.frame(data=rownames(Sums)) >by(df3$gene, df3$cluster, matrix) – Marina Jul 28 '22 at 07:49

0 Answers0