I have a nested dictionary annot_dict
with structure:
- key = long unique string
- value = list of dictionaries
The values, the list of dictionaries, each have structure:
- key = long unique string (a subcategory of the upper dictionary's key)
- value = list of five string items
An example of the entire structure is:
annot_dict['ID_string'] = [
{'ID_string': ['attr1a', 'attr1b', 'attr1c', 'attr1d', 'attr1e']},
{'string2' : ['attr2a', 'attr2b', 'attr2c', 'attr2d', 'attr2e']},
{'string3' : ['attr3a', 'attr3b', 'attr3c', 'attr3d', 'attr3e']},
]
The ID_string
is the same as the first sub-dictionary key. This is the output of a gff3 file parser function I wrote and the real dictionary information is the genes (ID_string
) and transcripts (string2
, string3
,...) from the genome of human chromosome 9, if anyone is familiar with the structure of that file type. The attribute lists describe biotype, start index, end index, strand, and description.
I want to put this information into a pandas DataFrame now. I want to loop through the outermost keys (the ID_string
s) in the dict to make one big DataFrame containing a row for each ID_string
and rows for each of its subcategories underneath it (string2
, string3
).
I want it to look like this:
| subunit_ID | gene_ID | start_index | end_index | strand |biotype | desc |
|------------|-----------|-------------|-----------|--------|--------|--------|
|'ID_string' |'ID_string'| 'attr1a' | 'attr1b' |'attr1c'|'attr1d'|'attr1e'|
| 'string2' |'ID_string'| 'attr2a' | 'attr2b' |'attr2c'|'attr2d'|'attr2e'|
| 'string3' |'ID_string'| 'attr3a' | 'attr3b' |'attr3c'|'attr3d'|'attr3e'|
I did look at other answers but none had quite the same dict structure as I do. This is my first question on SO so please feel free to improve the understandability of my question.