0

I'm trying to create a correlation matrix / contingency table style output for my data. I am working with a dictionary that contains overlap measures for (differing number of) clusters.

overlap={(0,0): 0.5, (1,0):0.2, (1,1):0.0}

Overlap btw clusters "0" and "0" is 0.5, etc. Now I would like to output this like this:

    0       1
0   0.5     0.2

1   0.2     0

I thought it would be easy enough but I am totally stuck at this point. Here is what I have done so far: I get my rows and columns.

t=overlap.items()
column_names=[i[0][0] for i in t]
rows=[[i[0][1], i[1]] for i in t]

I make a string template to fill those values in:

template="{}\t"*len(column_names)

Then I try to fill this by writing out the column names and iterating over the rows. And that's when I get stuck:

print template.format(??)
for row in rows:
    print template.format(??)

I don't know how to

  • Get format to accept the items of a the lists (either column or rows) piece by piece? (especially as I don't have the same number of clusters every time!)

  • Also, I would have to fill in duplicate values (1-2 vs 2-1) or replace them with white space?

  • Is this even possible / advisable as a print output?

I looked at PrettyTable and tabulate that were recommended elsewhere but couldn't get those to work either. I guess I could use Pandas or some other stats module, but it seems a bit of an overkill as all I want to do is output these values.

Edit: Here is what I ended up doing, where "dict" is my input dictionary:

entries=dict.items()
column_names=list(set([i[0][0] for i in entries]))
row_names=list(set([i[0][1] for i in entries]))
coltemplate="\t{:<25}"*len(column_names)
print "{:25}".format(" "), coltemplate.format(*column_names)
for r in row_names:
    result=[]
        for c in column_names:
            if c == r:
                result.append("***")
            elif dict.get((c,r), None) == None:
                result.append(dict.get((r,c), "***"))
            else:
                result.append(dict.get((c,r), "SERIOUS ERROR"))
result=[str(i) for i in result]
rowtemplate="\t{:25}"*len(result)
print "{:>25}".format(r), rowtemplate.format(*result)
patrick
  • 4,455
  • 6
  • 44
  • 61

1 Answers1

1

I am relativly new to computational field but I think I have a solution. if this doesnt help or is not conveniant please tell me why ( I have a lot to learn)

overlap={(0,0): 0.5, (1,0):0.2, (1,1):0.0, (2,1):0.3, (2,0):0.4}
t=overlap.items()

liste_columns = list(set([i[0][0] for i in t])) # get the columns name
liste_columns = [str(element) for element in liste_columns]

liste_rows =  list(set([i[0][0] for i in t])) # get the rows name
liste_rows = [str(element) for element in liste_rows]

header = '\t' + str('\t'.join(liste_columns)) # header column name
print(header)
for row in liste_rows: 
    print( row, end='\t') # row name
    for columns in liste_columns:
        key = (int(columns),int(row)) # key for accessing valu in dict
        if key in overlap:
            value = overlap[key]
        else:
            value = overlap[key[::-1]] #reverse the tuple 
        print(value, end= '\t')
    print('')

see for reversing tuple

How to reverse tuples in Python? output

    0   1   2
0   0.5 0.2 0.4
1   0.2 0.0 0.3

Hope this help

ps: If you need further explanation feel free to ask.

Community
  • 1
  • 1
RomainL.
  • 997
  • 1
  • 10
  • 24
  • Hi Romain, thanks, good stuff! However, this does break on me when I have an uneven number of clusters, correct? Like comparing three clusters from set 1 to two clusters from set 2? – patrick Apr 21 '16 at 15:51
  • you are totaly right I'm editing my answer to support this case! – RomainL. Apr 21 '16 at 16:29
  • 1
    please consider to upvote if you find my answer usefull – RomainL. Apr 22 '16 at 14:51
  • hi romain, i ended up just putting a "try" exception around this to make it work for uneven numbers. thanks for the good work! – patrick Apr 24 '16 at 02:24
  • actually I have modify it and it should work for uneven number? if it is not the case could you paste your code. So I could see where i was wrong? thanks – RomainL. Apr 25 '16 at 12:41
  • i posted my code as an edit to the question --- too long for the comments! i used the [.format()](https://pyformat.info/#string_pad_align) quite heavily to make it look a little prettier. besides that, it's pretty much what you suggested -- thanks for your help! – patrick Apr 26 '16 at 16:07