I'm trying to create a correlation matrix / contingency table style output for my data. I am working with a dictionary that contains overlap measures for (differing number of) clusters.
overlap={(0,0): 0.5, (1,0):0.2, (1,1):0.0}
Overlap btw clusters "0" and "0" is 0.5, etc. Now I would like to output this like this:
0 1
0 0.5 0.2
1 0.2 0
I thought it would be easy enough but I am totally stuck at this point. Here is what I have done so far: I get my rows and columns.
t=overlap.items()
column_names=[i[0][0] for i in t]
rows=[[i[0][1], i[1]] for i in t]
I make a string template to fill those values in:
template="{}\t"*len(column_names)
Then I try to fill this by writing out the column names and iterating over the rows. And that's when I get stuck:
print template.format(??)
for row in rows:
print template.format(??)
I don't know how to
Get
format
to accept the items of a the lists (either column or rows) piece by piece? (especially as I don't have the same number of clusters every time!)Also, I would have to fill in duplicate values (1-2 vs 2-1) or replace them with white space?
Is this even possible / advisable as a print output?
I looked at PrettyTable and tabulate that were recommended elsewhere but couldn't get those to work either. I guess I could use Pandas or some other stats module, but it seems a bit of an overkill as all I want to do is output these values.
Edit: Here is what I ended up doing, where "dict" is my input dictionary:
entries=dict.items()
column_names=list(set([i[0][0] for i in entries]))
row_names=list(set([i[0][1] for i in entries]))
coltemplate="\t{:<25}"*len(column_names)
print "{:25}".format(" "), coltemplate.format(*column_names)
for r in row_names:
result=[]
for c in column_names:
if c == r:
result.append("***")
elif dict.get((c,r), None) == None:
result.append(dict.get((r,c), "***"))
else:
result.append(dict.get((c,r), "SERIOUS ERROR"))
result=[str(i) for i in result]
rowtemplate="\t{:25}"*len(result)
print "{:>25}".format(r), rowtemplate.format(*result)