3

The following loop scans trough two lists (source and master) for the matched ID (index 0) and then for that row where the ID is a match, it looks trough changed columns and prints them:

        for row in source:
            identifier = row[0]
            new_cols = row[1:]
            for row in master:
                old_cols = row[1:]
                if identifier == row[0]:
                    print(row[0]) # ID that matched
                    changed_cols = [col for col in new_cols if col not in old_cols] 
                    print(changed_cols) # cols that differ

Lists contain over 20 columns per row so I thought using row[1:] would be smart but I'm not sure how to use this method to get the changed column's index. Thanks for any help.

UPDATE:

source = [['1002', '', '', '', '13RA11', '', 'LO', '4302', '99111', '0', ''], 
['1076', '', '', '', '13RA11', '', 'LO', '4302', '999111', '0', ''], 
['1130', '', '', '', '11HOT1A', '', 'LO', '4302', '99111', '0', '']]

master = [['1002', '', '', '', '13RA11', '', 'LO', '4302', '99111', '0', ''], 
['1076', '', '', '', '13RA11', '', 'LO', '4302', '999111', '1', ''], 
['1130', '', '', '', '13RA11', '', 'LO', '4302', '99111', '1', '']]
strongbad
  • 133
  • 1
  • 4
  • 12
  • I am a bit confused about what you are trying to achieve. Given the source row `[1,0,0,0]` and the corresponding master row `[1,0,0,1]`, your code (and the code in my answer) will not pick up any "changed columns". Are you sure this is what you want? Or do you want something like what Gary posted? – Michael S Priz Aug 06 '15 at 13:40
  • Hmmmm... You may have to be careful about duplicate entries in the row. Keep in mind the condition `col not in old_cols` will be false if `col` is *anywhere* in the list `old_cols`. Is this what you want? I will add an alternative to my solution that may be what you *actually* want. – Michael S Priz Aug 07 '15 at 12:44

3 Answers3

1

Try creating a filter and use zip to fold together each matching row (i.e. zip together rows in source and master which have matching IDs.)

# A generator that returns True/False based on items matching.
def bool_gen(zipped):
    for tup in zipped:
        yield tup[0] == tup[1]

# Use enumerate to store columns as you iterate over the generator.
for enum, item in enumerate(bool_gen(zip(source_row1, master_row1))):
    if (item == True):
        # Print the matching index.
        print(enum)

For source_row1 = [1,6,3,8], master_row1 = [5,6,7,8] this prints indices 1 and 3. You could also put that whole thing in a list comprehension if you want, as follows:

changed_cols = [enum for enum, item in enumerate(bool_gen(zip(source_row1, master_row1))) if (item == True)]
# changed_cols returns [1, 3]

Putting this suggestion to work for your code:

for row in source:
    identifier = row[0]
    new_cols = row[1:]
    for row in master:
        old_cols = row[1:]
        if identifier == row[0]:
            print(row[0]) # ID that matched
            changed_cols = [enum for enum, item in enumerate(bool_gen(zip(new_cols, old_cols))) if (item == True)]
            print(changed_cols) # cols that differ

However, as you can see, it doesn't reduce the amount of code required nor make it more readable. I'm not certain which code would be more efficient.

Let us know if our answers are off the mark. If so, add some more details to your question.

gary
  • 4,227
  • 3
  • 31
  • 58
1

Have you considered using enumerate? Your list comprehension would change to this:

changed_cols = [(ID,col) for ID,col in enumerate(new_cols) if col not in old_cols]

This seems the simplest solution to me.

Let me know if I have misunderstood your question and I will work to adjust my solution :)

EDIT: I think you might want something like what Gary suggested:

changed_cols = [(ID,col) for ID,col in enumerate(new_cols) if col != old_cols[ID]]

This will compare only the corresponding old column for each new column. I would guess that this is the functionality you would actually want. Let me know if you are unsure of the difference :)

Michael S Priz
  • 1,116
  • 7
  • 17
1

You should keep the column number to do the comparisons. If you do not, you would not detect an exchange between 2 columns. You could do :

for row in source:
    identifier = row[0]
    new_cols = row[1:]
    for row in master:
        if identifier == row[0]:
            old_cols = row[1:]
            print(row[0]) # ID that matched
            n = len(new_cols) if len(new_cols) <= len(old_cols) else len(old_cols)
            changed_cols = [(i, old_cols[i], new_col[i]) for i in range(n) if new_cols[i] != old_cols[i ]] 
            print(changed_cols) # cols that differ
            if len(new_cols) < len(old_cols): print(len(old_cols)-len(new_cols), " cols missing")
            if len(new_cols) > len(old_cols): print(len(new_cols)-len(old_cols), " cols added")
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252