0

I'm working with Allen Brain's mouse RNA-seq data, and from the dend.json file provided I want to create a dictionary where the key is a parent node, and the value would be the nodes the parent node splits into or leads to. You can see the dendrogram here.

The dictionary from loading the json file looks like this:

{'node_attributes': [{'height': 0.8416,
   'members': 290,
   'edgePar.col': '#000000',
   'edgePar.lwd': 2,
   'edgePar.conf': 1,
   'label': '',
   'midpoint': 256.4472,
   'cell_set_accession': 'CS1910120323',
   'cell_set_alias': '',
   'cell_set_designation': 'Neuron/Non-Neuron',
   'X': '291',
   'node_id': 'n1'}],
 'children': [{'node_attributes': [{'height': 0.6271,
     'members': 279,
     'edgePar.col': '#000000',
     'edgePar.lwd': 2,
     'edgePar.conf': 1,
     'label': '',
     'midpoint': 226.7537,
     'cell_set_accession': 'CS1910120324',
     'cell_set_alias': '',
     'cell_set_designation': 'Neuron/Non-Neuron',
     'X': '292',
     'node_id': 'n2'}],
   'children': [{'node_attributes': [{'height': 0.365,
       'members': 271,
       'edgePar.col': '#000000',
       'edgePar.lwd': 2,
       'edgePar.conf': 1,
       'label': '',
       'midpoint': 178.695,
       'cell_set_accession': 'CS1910120325',
       'cell_set_alias': '',
       'cell_set_designation': 'Neuron 001-271',
       'X': '293',
       'node_id': 'n3'}],............

and dictionary['children'][0] follows a left split, and if there are two splits at a node, dictionary['children'][1] follows a right split.

I want the form of the output to be something like:

{n1 : [n2, n281],
 n2 : [n3, n284],...}

At the moment, I'm just able to parse the dictionary and return the nodes using code adapted from another post:

def walk(d):

    for k,v in d.items():
        if isinstance(v, str) or isinstance(v, int) or isinstance(v, float):
            if k == 'node_id':
                print('node:', v)
        elif isinstance(v, list):
            for v_int in range(len(v)):
                walk(v[v_int])

walk(dend)

Output:
node: n1
node: n2
node: n3
node: n4
node: n183
node: n184
node: n185

1 Answers1

1

This might be close to what you want.

https://github.com/danielsf/AllenInstTools_by_SFD/blob/master/parse_dendrogram.py

It creates a class CellNode that stores, for each node in the dendrogram, the name (the cell_set_accession) of the node, as well as lists of the names for all of the ancestors, children (immediate children) and ultimate children (all nodes descended from the current node) in the tree. The method build_tree will return a dict keyed on the cell_set_accession, whose values are the CellNode for that node.

If you don't like using cell_set_accession as the name for the nodes, you can change that at line 120 of the script.

If you want more or less information in your dict, you can identify leaf nodes because they will return empty lists for node.children.

The code was good enough for my purposes (which is a nice way of saying I haven't rigorously tested it). Feel free to reach out if something doesn't work as expected.

  • Thanks, Scott. This looks promising. I'm having a bit of trouble generating an output. At the command prompt I'm running `python parse_dendrogram.py --dend_name data/dend.json > dend_out.txt` but am just getting a blank text file. Are there other arguments or considerations I should be aware of? – Munib Hasnain Mar 13 '20 at 16:04
  • You can ignore my first comment. I just realized I can run this in jupyter and everything works as expected. Thanks again. – Munib Hasnain Mar 13 '20 at 16:26