0

I am hierarchically clustering gene expression data. My result data is shaped like dendrogram. I want to keep the whole tree in some data structure in python and do some calculations in each node (I think recursively). For every node I know the genes in there and some extra information (GO, p-values etc..) Do you have any suggestions on how to store this kind of data in python in a way that I can traverse the whole tree? My first thought was a list of dictionaries:

clusters=[{'id': 1, 'cluster': [gen1, gen2,...], 'size': ... , 'ChildIDs': ... , 'ParentID': ... , 'distance': ..., 'score': ...}, {'id': 2, ...}, ... ]

But since the clusters are nested, then storing the genes for every cluster is not efficient, I think.

If anyone has a better idea how to keep this kind of info, I would appreciate it:)

The6thSense
  • 8,103
  • 8
  • 31
  • 65
Liis Kolberg
  • 153
  • 12
  • You could maybe take a look to [this implementation of hierarchical clustering](http://www.nltk.org/_modules/nltk/cluster/gaac.html). It stores the information in a [Dendogram class](http://www.nltk.org/_modules/nltk/cluster/util.html). If you look at the method `show` of the class `Dendogram` you might get an idea of what it is doing. I don't know how many genes you're talking about, probably a lot, and how efficient this implementation will be... Hope it helps anyway. – lrnzcig Sep 06 '15 at 17:26

1 Answers1

0

Short of writing your own datastructure, collections.defaultdict might be appropriate for a tree. See here.

Community
  • 1
  • 1
  • Thanks! I'm not sure that I can get the 'cluster' out from this structure. The problem is, that I have like 23000 genes as leaves, the root cluster has all the genes, the next child cluster has less genes from these 23000 etc ... If I know I want to see the genes from cluster with id:3, then how could that be done? I think it is best to keep the gene names only once as leafs and then according to cluster id I would get the names. I'm quite new to this kind of structures. – Liis Kolberg Sep 04 '15 at 12:21
  • Sorry, I only have a vague idea of what you're against, but would hashing a set (`frozenset`) or dictionary be a viable option to store a reference that can be decoded, compared etc.? – Maksim Yegorov Sep 04 '15 at 12:55