1

the task on hand where I got stuck is, that I have to put the table content of a file in a dictionary of dictionaries structure. The file contains something like this: (first six lines of ascii-file)

Name-----------|Alt name-------|------RA|-----DEC|-----z|---CR|----FX|---FX*|Error|---LX|--NH|ID-|Ref#---- RXCJ0000.1+0816 UGC12890 0.0295 8.2744 0.0396 0.26 5.80 5.39 12.4 0.37 5.9 1,3 RXCJ0001.9+1204 A2692 0.4877 12.0730 0.2033 0.08 1.82 1.81 17.9 3.24 5.1 1
RXCJ0004.9+1142 UGC00032 1.2473 11.7006 0.0761 0.17 3.78 3.68 12.7 0.93 5.3 2,4
RXCJ0005.3+1612 A2703 1.3440 16.2105 0.1164 0.24 4.96 4.94 11.8 2.88 3.7 B 2,5
RXCJ0006.3+1052 a) 1.5906 10.8677 0.1698 0.15 3.28 3.28 19.3 4.05 5.6 1

I can provide a file sample if necessary.

The following code works fine till it comes to storing each line-dict into a second dict.

#!/usr/bin/env python3
from collections import *
from re import *
obsrun = {}
objects = {}
re = compile('\d+.\d\d\d\d')

filename = 'test.asc'

with open(filename, 'r') as f:
    lines = f.readlines()

for l in line[2:]:
    #split the read lines into a list
    o_bject = l.split()
    #print(o_bject)
    #interate over each entry and people the line-dictionary with values of interest
    #what's needed (in col of table): identifier, common name, rightascension, declination 
    for k in o_bject:
    objects.__setitem__('id', o_bject[0])
    objects.__setitem__('common_name', o_bject[1])
        # sometimes the common name has blanks, multiple entries or replacements
    if re.match(o_bject[2]):
        objects.__setitem__('ra', float(o_bject[2] ) )
        objects.__setitem__('dec', float(o_bject[3] ) )
    else:
        objects.__setitem__('ra', float(o_bject[3] ) )
        objects.__setitem__('dec', float(o_bject[4] ) )

    #extract the identifier (name of the object) for use as key
    name = objects.get('id')
    #print(name)

    print(objects) #*
    # as documented in http://stackoverflow.com/questions/1024847/add-to-a-dictionary-in-python
    obsrun[name] = objects
    #print(obsrun)

    #getting an ordered dictionary sorted by keys
    OrderedDict(sorted(obsrun.items(), key= lambda t: t[0] ) ) #t[0] keys,t[1] values

What one can see from the output on console is, that the inner for-loop does what's supposed to do. It's confirmed by the print(objects) at *. But when it comes to storing the row-dicts as value in the second dict, it's people with the same values. The keys are correctly built.

What I don't understand is, that the print() command displays the correct content of "objects" but they are not stored into "obsrun" correctly. Does the error lie in the dict view nature or what did I do wrong?

How should I improve the code?

Thanks in advance, Christian

camaro
  • 118
  • 12

3 Answers3

1

You created only one dictionary, so each time through the loop you are modifying the same one.

Move the line

objects = {}

into the for l in line[2:]: loop. This will create a separate dict for each line of the file.

Also, using __setitem__ directly is unnecessary and makes the code harder to read. Change the lines from objects.__setitem__('id', o_bject[0]) to objects['id'] = o_bject[0].

dsh
  • 12,037
  • 3
  • 33
  • 51
1

It's worth pointing out that you don't really need a dict-of-dicts unless you are trying to look up the entries by name. (You don't explain much what the use case is, here.)

The one thing that leaps out from your code is that you're using setitem a lot - I think maybe you are coming from C++ or Java, where dictionaries do not have language support built in. In Python, this is not the case- you can say d[key]=value to add an item to a dictionary.

Here's some code to create a list (array) of dictionaries. It would be pretty trivial to make Table a dictionary keyed on one of the fields. I'll leave that for you to figure out. :)

Alternatively, a list is much easier to iterate over than a dict, if your problem is going to be performing computations on the data. So if you have to add up or average up or find the min/max, you probably want this version. #!/usr/bin/env python3 -tt

data = open('test.asc')
header = data.readline().replace('-', '')

Field_names = header.split('|')
Table = []

# Read in the remaining lines, one at a time
for line in data:
    fields = line.split()
    Table.append(dict(zip(Field_names, fields)))

from pprint import pprint

pprint(Table)
aghast
  • 14,785
  • 3
  • 24
  • 56
0

So you say, that giving "objects" to obsrun is just linking "objects" and not copying the content? So I have to keep each inner dict since it's just linked.

You're right about setitem. I used it to make it more clear to me, what exactly I'm doing there.

I will try moving objects = {} into the inner for-loop.

Thanks for the answer. Will get back to report if that did the trick.

Update: That did it! Thanks so much, I really got stuck there, but I learned something import about dictionaries and that, in this cased, they are just linked, so it's memory saving already. cheers, Christian

camaro
  • 118
  • 12