-1
    file = open(outFile, 'w+')

    matrix = defaultdict(lambda: defaultdict(int))

    for s in range(len(self.goldenTags)):
        for w in range(len(self.goldenTags[s])):
            matrix[self.goldenTags[s][w].tag][self.myTags[s][w].tag] += 1

I created a nested dictionary that represents a confusion matrix of a POS tagger, and it looks like :

         'VBP': defaultdict(<class 'int'>,
                            {'CD': 4,
                             'FW': 1,
                             'JJ': 5,
                             'JJS': 1,
                             'NN': 61,
                             'NNP': 6,
                             'NNPS': 1,
                             'SYM': 2,
                             'UH': 19,
                             'VB': 72,
                             'VBD': 5,
                             'VBG': 2,
                             'VBP': 537,
                             'VBZ': 1}),

which is kinda ugly. I want to save this as a neat matrix format into a txt file preferably without using any library. What is a good way to do this?

     Tag Tag Tag Tag Tag   
Tag   1   0   2  inf  4
Tag   4   2   0   1   5
Tag  inf inf  1   0   3
Tag   3   4   5   3   0
user6792790
  • 668
  • 1
  • 7
  • 22

3 Answers3

1

Using string formatting

d = {'VBP':{'CD': 4,'FW': 1,'JJ': 5,'NN': 61,'NNP': 6,'NNPS': 1,
            'SYM': 2,'VB': 72,'VBD': 5,'VBG': 2,'VBZ': 1},
     'xyz':{'CD': 4,'FW': 1,'JJS': 1,'NN': 61,'NNP': 6,'NNPS': 1,
            'UH': 19,'VB': 72,'VBD': 5,'VBP': 537,'VBZ': 1}}

# find all the columns and all the rows, sort them    
columns = sorted(set(key for dictionary in d.values() for key in dictionary))
rows = sorted(d)

# figure out how wide each column is
col_width = max(max(len(thing) for thing in columns),
                    max(len(thing) for thing in rows)) + 3

# preliminary format string : one column with specific width, right justified
fmt = '{{:>{}}}'.format(col_width)

# format string for all columns plus a 'label' for the row
fmt = fmt * (len(columns) + 1)

# print the header
print(fmt.format('', *columns))

# print the rows
for row in rows:
    dictionary = d[row]
    s = fmt.format(row, *(dictionary.get(col, 'inf') for col in columns))
    print(s)

>>>
            CD     FW     JJ    JJS     NN    NNP   NNPS    SYM     UH     VB    VBD    VBG    VBP    VBZ
    VBP      4      1      5    inf     61      6      1      2    inf     72      5      2    inf      1
    xyz      4      1    inf      1     61      6      1    inf     19     72      5    inf    537      1
>>> 

Put it in a function that yields strings instead of printing them; iterate over the function writing the return value to the file.

wwii
  • 23,232
  • 7
  • 37
  • 77
0

Without using any libraries, you can still create a csv-style output using lists.

# create a nested dictionary
d = {'x': {'v1':4, 'v2':5, 'v3':12}, 
     'y':{'v1':2, 'v2':1, 'v3':11}, 
     'z':{'v2':5, 'v3':1}}

# get all of the row and column ids
row_ids = sorted(d.keys())
col_ids = sorted(set(k for v in d.values() for k in v.keys()))

# create an empty list and fill it with the header and then the rows
out = []

# header
out.append(['']+col_ids)

for r in row_ids:
    out.append([r]+[d[r].get(c, 0) for c in col_ids])

out
# returns
[['', 'v1', 'v2', 'v3'], 
 ['x', 4, 5, 12], 
 ['y', 2, 1, 11], 
 ['z', 0, 5, 1]]
James
  • 32,991
  • 4
  • 47
  • 70
0

Instead of 'reinventing the wheel' use .xml, .json, or .ini. Plenty of libraries are available for these and more. For a simple example check out https://docs.python.org/3/library/configparser.html .