Matching Indices from Two Sets of Data

Question

I have a table with headings A B C D. The values under D are indexed by those under A,B,C.

I also have a list of objects indexed by values contained in the A and B columns, i.e. (A,B).

For each object I want to write to a file all the entries in the table that have the same A,B index as my object.

This is what I'm doing:

prescriptions = {}

#Open ABCD table and create a dictionary mapping A,B,C to D
with open(table_file) as table:
    reader = csv.reader(table, delimiter = '\t')
    for row in reader:
        code = (row[0], row[1], row[2])
        prescriptions[code]=row[3]

for x in objects:
    x_code = (x.A, x.B)

    for p in prescriptions:
        #check to see if A,B indices on x match those of the table entry
        if p[0:2] == x_code:
            row = prescriptions[p]
            line = ",".join(p) +"," + row +"\n"
            output.write(line)

This works. I get the exact output I want; however, when the table and the list get large, it takes an ungodly amount of time.

I'd love to modify my iterator (remove a p when I've found a match for it), but I know not to do that.

Is there anything I can do to speed this up?

Spare you the trouble, just import everything in a database (SQLite will do) and use SQL to match records. — Paulo Scardine, Jun 24 '13 at 22:17
do not write to output for every line. append to a list and write at the end in one shot. — joaquin, Jun 24 '13 at 22:22
@joaquin good idea. Am I deluded in hoping there's a way of doing this that is lightning fast, using Python, that I'm just overlooking? — ABM, Jun 24 '13 at 22:42

Benjamin Toueg · Accepted Answer · 2013-06-24T22:41:29.323

I guess prescription is a dictionary?

Why not having a dictionary prescription2 with A,B as key and list of C,D as value? It will spare you the hassle of iterating through all the dictionary.

prescriptions = {}
prescriptions2 = {}

#Open ABCD table and create a dictionary mapping A,B,C to D
with open(table_file) as table:
    reader = csv.reader(table, delimiter = '\t')
    for row in reader:
        code = (row[0], row[1], row[2])
        prescriptions[code]=row[3]
        key = (row[0],row[1])
        if not key in prescription2:
            prescription2[key] = []
        value = (row[2],row[3])
        prescription2[key].append(value)

for x in objects:
    x_code = (x.A, x.B)
    if x_code in prescription2:
        for item in prescription2[x_code]:
            line = ",".join(x_code+item)+"\n"
            output.write(line)

Matching Indices from Two Sets of Data

1 Answers1