print nested dict to tsv format file

Question

I have the following dict:

{'A1137': {'Called': 10, 'hom_alt': 10, 'private_hom': 8},
 'A2160': {'Called': 10, 'hom_alt': 1, 'hom_ref': 9},
 'A2579': {'Called': 10, 'hom_alt': 1, 'hom_ref': 9},
 'A2594': {'Called': 9, 'hom_alt': 1, 'hom_ref': 8}}

My desired output is:

stats A1137 A2160 A2579 A2594
Called 10 10 10 9
hom_alt 10 1 1 1
hom_ref 0 9 9 8
private_hom 8 0 0 0

As can be observed, if any subset misses a 'counter', a zero should take the place. I have tried different ways to do it but I can't achieve it. I'm able to do the printing with a simple dict but not with a nested one:

with open(res, 'w') as csvfile:
    w = csv.writer(csvfile, delimiter='\t')
    w.writerow(['#Global Statistics:'])
    for key, value in d.items():
        w.writerow([key, value])
    w.writerow(['\n'])
return res

Do you know all possible keys in the nested dictionaries up front or should that be auto-discovered from those keys? — Martijn Pieters, Apr 25 '16 at 17:17
Ah, ok. For the moment these are the possible keys: Called, hom_ref, het, hom_alt, phased, private_het, private_hom. But maybe in the future I'd need to add more. — cucurbit, Apr 25 '16 at 17:25

score 1 · Accepted Answer · answered Apr 25 '16 at 17:29

This is easier using csv.DictWriter(), where you pass in a dictionary for each row.

You could auto-discover the keys in the dictionaries by creating the union of all the contained dictionaries (which will pull out the keys); these are the stats values in your output:

fields = sorted(d)
stats = sorted(set().union(*d.values()))  # use d.itervalues() in Python 2

with open(res, 'w') as csvfile:
    w = csv.DictWriter(csvfile, delimiter='\t', fieldnames=['stats'] + fields)
    w.writerow({'stats': '#Global Statistics:'})
    w.writeheader()
    for stat in stats:
        # produce a dictionary mapping field name to specific statistic for
        # this row
        row = {k: v.get(stat, 0) for k, v in d.items()}
        row['stats'] = stat
        w.writerow(row)

Demo:

>>> import csv
>>> import sys
>>> d = {'A1137': {'Called': 10, 'hom_alt': 10, 'private_hom': 8},
...      'A2160': {'Called': 10, 'hom_alt': 1, 'hom_ref': 9},
...      'A2579': {'Called': 10, 'hom_alt': 1, 'hom_ref': 9},
...      'A2594': {'Called': 9, 'hom_alt': 1, 'hom_ref': 8}}
>>> fields = sorted(d)
>>> stats = sorted(set().union(*d.values()))
>>> w = csv.DictWriter(sys.stdout, delimiter='\t', fieldnames=['stats'] + fields)
>>> w.writerow({'stats': '#Global Statistics:'})
#Global Statistics:
>>> w.writeheader()
stats   A1137   A2160   A2579   A2594
>>> for stat in stats:
...     # produce a dictionary mapping field name to specific statistic for
...     # this row
...     row = {k: v.get(stat, 0) for k, v in d.items()}
...     row['stats'] = stat
...     w.writerow(row)
...
Called  10      10      10      9
hom_alt 10      1       1       1
hom_ref 0       9       9       8
private_hom     8       0       0       0

Thank you very much @Martijn. I've a question, is it possible to "mantain the order" of the dict? I mean, in this particular case A1137,A2160,A2579,A2594 are sorted but, imagine that A1137 is named as A3137, so our keys are: A3137,A2160,A2579,A2594. Is it possible to have the output table in this order? — cucurbit, Apr 26 '16 at 13:47
@cucurbit: no, because dictionaries do not have an order. They are unordered structures. See [Why is the order in Python dictionaries and sets arbitrary?](https://stackoverflow.com/a/15479974) — Martijn Pieters, Apr 26 '16 at 14:03

Alan · Answer 2 · 2016-04-25T18:17:42.650

from collections import defaultdict

data = {
 'A1137': {'Called': 10, 'hom_alt': 10, 'private_hom': 8},
 'A2160': {'Called': 10, 'hom_alt': 1, 'hom_ref': 9},
 'A2579': {'Called': 10, 'hom_alt': 1, 'hom_ref': 9},
 'A2594': {'Called': 9, 'hom_alt': 1, 'hom_ref': 8}
}

fields = "stats","Called","hom_alt","hom_ref","private_hom"

newdata = list()
for (k,v) in data.items():
    d = defaultdict(int)
    d.update(v)
    d["stats"] = k
    newdata.append(d)

table = [fields]
for d in newdata:
    table.append([d[f] for f in fields])

#first, a pretty print
fmt = "{:<11}" + "{:>6}" * (len(d) - 1)
for row in zip(*table):
    print(fmt.format(*row))

tsvfmt = "\t".join(["{}"]*len(d))
for row in zip(*table):
    print(tsvfmt.format(*row))

print nested dict to tsv format file

2 Answers2