Iterate through two files to create a new file that has fields from second file appended to fields of first file

Question

I am new to Python. I attempted to use logic from answers in @mgilson, @endolith, and @zackbloom zack's example

I am getting a bunch of blank columns placed in front of the first field of the primary record.
My out_file is empty (more than likely because of the columns from the two files cannot match up.

How can I fix this? The end result should look like the following:

('PUDO_id','Load_id','carrier_id','PUDO_from_company','PUDOItem_id';'PUDO_id';'PUDOItem_make')              
('1','1','14','FMH MATERIAL HANDLING SOLUTIONS','1','1','CROWN','TR3520 / TWR3520','TUGGERS')
('2','2','7','WIESE USA','2','2','CAT','NDC100','3','2','CAT','NDC100','4','2',' 2 BATTERIES')

Note: In the output of the 3rd row, it appended 3 rows from the sub file to the array, while the first 2 rows only appended 1 row from the sub file. This is determined by the value in pri[0] and sub[1] comparing TRUE.

Here is my code based on @Zack Bloom:

def build_set(filename):
    # A set stores a collection of unique items.  Both adding items and searching for them
    # are quick, so it's perfect for this application.
    found = set()

    with open(filename) as f:
        for line in f:
                # Tuples, unlike lists, cannot be changed, which is a requirement for anything
                # being stored in a set.
                line = line.replace('"','')
                line = line.replace("'","")
                line = line.replace('\n','')
                found.add(tuple(sorted(line.split(';'))))
    return found

set_primary_records = build_set('C:\\temp\\oz\\loads_pudo.csv')
set_sub_records     = build_set('C:\\temp\\oz\\pudo_items.csv')
record                  = []

with open('C:\\temp\\oz\\loads_pudo_out.csv', 'w') as out_file:
   # Using with to open files ensures that they are properly closed, even if the code
   # raises an exception.

    for pri in set_primary_records :
        for sub in set_sub_records :
            #out_file.write(" ".join(res) + "\n")
            if sub[1] == pri [0] :
                record = pri.extend(sub)
            out_file.write(record + '\n')

Sample source data (primary records):

PUDO_id;"Load_id";"carrier_id";"PUDO_from_company"              
1;"1";"14";"FMH MATERIAL HANDLING SOLUTIONS"                
2;"2";"7";"WIESE USA"

Sample source data (sub records):

PUDOItem_id;"PUDO_id";"PUDOItem_make"
1;"1";"CROWN";"TR3520 / TWR3520";"TUGGERS"
2;"2";" CAT";"NDC100"
3;"2";"CAT";"NDC100"
4;"2";" 2 BATTERIES"
5;"11";"MIDLAND"

Why is PHP in the title, and linking to questions/answers works much better than tagging people's screen names, I, for one, am not going to look through their answer history and try to figure out which posts you're referring to. — Jake Sellers, Aug 07 '13 at 21:02
I have added a link to the example mentioned. (See [Zack's Example](http://stackoverflow.com/questions/7757626/compare-two-different-files-line-by-line-and-write-the-difference-in-third-file/7758213#7758213)). — Dr.EMG, Aug 08 '13 at 00:37

fzzylogic · Answer 1 · 2013-08-09T13:40:57.457

The extend attribute is not available for tuples which is what build_set is creating. Tuples are immutable but they can be concatenated or sliced with normal python string functions.

For example:

with open('C:\\temp\\oz\\loads_pudo_out.csv', 'w') as out_file:
    for pri in set_primary_records :
        for sub in set_sub_records :
            if sub[1] == pri[0] :
                record = pri + sub
                out_file.write(str(record)[1:-1] + '\n')

This is the same code as above, just modified to allow for tuple concatenation. In the write line we convert record to a string and strip the start and end brackets, before appending '\n'. Maybe there are better / prettier ways to do this, but I'm new to Python too.

Edit: To get the output you are expecting, a few changes are required:

# On this line, remove the sort() as we do not wish to change tuple item order..
found.add(tuple(line.split(';')))

...

with open('C:\\temp\\loads_out.csv', 'w') as out_file:
    for pri in set_primary_records:
        record = pri                        # record tuple is set in main loop
        for sub in set_sub_records:
            if sub[1] == pri[0]:
                record += sub               # for each match, sub appended to record
        out_file.write(str(record) + '\n')  # removed stripping of brackets

Your update has gotten me a step closer. However, I seem to be outputing a cartisan product (every inner record connecting to every outer record). — Dr.EMG, Aug 08 '13 at 17:18
@Dr.EMG Updated to reflect the output you expected (i hope). Removed sorting of tuple in build_set. 'record' is now built by appending sub matches. — fzzylogic, Aug 09 '13 at 13:45

Iterate through two files to create a new file that has fields from second file appended to fields of first file

1 Answers1