2

I have a massive dictionary of items in a co-occurrence format. Basically, conditional word vectors. the simplified dictionary looks something like this:

reservoir ={
 ('a', 'b'): 2,
 ('a', 'c'): 3,
 ('b', 'a'): 1,
 ('b', 'c'): 3,
 ('c', 'a'): 1,
 ('c', 'b'): 2,
 ('c', 'd'): 5,             ,
}

For the sake of storage, I have decided that if there isn't a co-occurrence, then to not store the information at all, ie: the fact that a and b never occur with d, and therefore I do not have any information associated with either point.

The result I'm trying to get is that for every tuple, key1=x and key2=y, so that in a matrix it will look like this:

  a b c d
a 0 2 3 0
b 1 0 3 0
c 1 2 0 5
d 0 0 0 0

I

I have found information in this post: Adjacency List and Adjacency Matrix in Python, but it's just not quite what I'm looking to do. All my attempts thus far have proven to be less than fruitful. Any help would be amazing.

Thanks again,

Community
  • 1
  • 1
Swanson Ron
  • 131
  • 1
  • 6
  • I've read through this a couple of times, but I am not entirely sure what you are asking. Are you asking to display such a matrix (I can't imagine trying to accommodate this amount of data in friendly display format, unless you are using something like a spreadsheet)? Or are you trying to represent the matrix in Pythonic data structures? Or are you after something else? – Justin O Barber Jan 21 '14 at 01:30
  • Do you need to implement the matrix at all? Couldn't you use numpy instead? – Arthur Julião Jan 21 '14 at 03:01
  • To answer the first question: I suppose both. As I am more familiar with R, I would like to use this in a dataframe from which to do analysis. The displaying of the data in a spreadsheet like format would simply be for proof of concept. For the Second question: I've never used numpy for any type of implementation, but as I'm trying to make a conscious switch from R to python, I will most definitely start looking into it. – Swanson Ron Jan 21 '14 at 06:16

1 Answers1

3

You really just need to get the labels for the rows and columns. From there, it's just a few for loops:

from __future__ import print_function

import itertools

reservoir = {
    ('a', 'b'): 2,
    ('a', 'c'): 3,
    ('b', 'a'): 1,
    ('b', 'c'): 3,
    ('c', 'a'): 1,
    ('c', 'b'): 2,
    ('c', 'd'): 5
}

fields = sorted(list(set(itertools.chain.from_iterable(reservoir))))

print(' ', *fields)

for row in fields:
    print(row, end=' ')

    for column in fields:
        print(reservoir.get((row, column), 0), end=' ')

    print()

Your table will start getting ugly when the cells get more than one digit, so I'll leave that to you to figure out. You'll just need to find the maximal length of the field for each column before printing them.

Blender
  • 289,723
  • 53
  • 439
  • 496