printing out the index of changed columns per row of a list

Question

The following loop scans trough two lists (source and master) for the matched ID (index 0) and then for that row where the ID is a match, it looks trough changed columns and prints them:

        for row in source:
            identifier = row[0]
            new_cols = row[1:]
            for row in master:
                old_cols = row[1:]
                if identifier == row[0]:
                    print(row[0]) # ID that matched
                    changed_cols = [col for col in new_cols if col not in old_cols] 
                    print(changed_cols) # cols that differ

Lists contain over 20 columns per row so I thought using row[1:] would be smart but I'm not sure how to use this method to get the changed column's index. Thanks for any help.

UPDATE:

source = [['1002', '', '', '', '13RA11', '', 'LO', '4302', '99111', '0', ''], 
['1076', '', '', '', '13RA11', '', 'LO', '4302', '999111', '0', ''], 
['1130', '', '', '', '11HOT1A', '', 'LO', '4302', '99111', '0', '']]

master = [['1002', '', '', '', '13RA11', '', 'LO', '4302', '99111', '0', ''], 
['1076', '', '', '', '13RA11', '', 'LO', '4302', '999111', '1', ''], 
['1130', '', '', '', '13RA11', '', 'LO', '4302', '99111', '1', '']]

I am a bit confused about what you are trying to achieve. Given the source row `[1,0,0,0]` and the corresponding master row `[1,0,0,1]`, your code (and the code in my answer) will not pick up any "changed columns". Are you sure this is what you want? Or do you want something like what Gary posted? — Michael S Priz, Aug 06 '15 at 13:40
Hmmmm... You may have to be careful about duplicate entries in the row. Keep in mind the condition `col not in old_cols` will be false if `col` is *anywhere* in the list `old_cols`. Is this what you want? I will add an alternative to my solution that may be what you *actually* want. — Michael S Priz, Aug 07 '15 at 12:44

gary · Answer 1 · 2015-08-07T00:45:25.813

Try creating a filter and use zip to fold together each matching row (i.e. zip together rows in source and master which have matching IDs.)

# A generator that returns True/False based on items matching.
def bool_gen(zipped):
    for tup in zipped:
        yield tup[0] == tup[1]

# Use enumerate to store columns as you iterate over the generator.
for enum, item in enumerate(bool_gen(zip(source_row1, master_row1))):
    if (item == True):
        # Print the matching index.
        print(enum)

For source_row1 = [1,6,3,8], master_row1 = [5,6,7,8] this prints indices 1 and 3. You could also put that whole thing in a list comprehension if you want, as follows:

changed_cols = [enum for enum, item in enumerate(bool_gen(zip(source_row1, master_row1))) if (item == True)]
# changed_cols returns [1, 3]

Putting this suggestion to work for your code:

for row in source:
    identifier = row[0]
    new_cols = row[1:]
    for row in master:
        old_cols = row[1:]
        if identifier == row[0]:
            print(row[0]) # ID that matched
            changed_cols = [enum for enum, item in enumerate(bool_gen(zip(new_cols, old_cols))) if (item == True)]
            print(changed_cols) # cols that differ

However, as you can see, it doesn't reduce the amount of code required nor make it more readable. I'm not certain which code would be more efficient.

Let us know if our answers are off the mark. If so, add some more details to your question.

This code has different functionality from the list comprehension the OP used. Though this seems to make more sense. — Michael S Priz, Aug 06 '15 at 13:42
thank you! your solution gave me a better understanding of the whole process — strongbad, Aug 07 '15 at 06:56

Michael S Priz · Accepted Answer · 2015-08-07T12:49:55.660

Have you considered using enumerate? Your list comprehension would change to this:

changed_cols = [(ID,col) for ID,col in enumerate(new_cols) if col not in old_cols]

This seems the simplest solution to me.

Let me know if I have misunderstood your question and I will work to adjust my solution :)

EDIT: I think you might want something like what Gary suggested:

changed_cols = [(ID,col) for ID,col in enumerate(new_cols) if col != old_cols[ID]]

This will compare only the corresponding old column for each new column. I would guess that this is the functionality you would actually want. Let me know if you are unsure of the difference :)

Glad we could help :) – Michael S Priz Aug 07 '15 at 12:41 — Michael S Priz, Aug 07 '15 at 12:41

score 1 · Answer 3 · answered Aug 06 '15 at 14:04

You should keep the column number to do the comparisons. If you do not, you would not detect an exchange between 2 columns. You could do :

for row in source:
    identifier = row[0]
    new_cols = row[1:]
    for row in master:
        if identifier == row[0]:
            old_cols = row[1:]
            print(row[0]) # ID that matched
            n = len(new_cols) if len(new_cols) <= len(old_cols) else len(old_cols)
            changed_cols = [(i, old_cols[i], new_col[i]) for i in range(n) if new_cols[i] != old_cols[i ]] 
            print(changed_cols) # cols that differ
            if len(new_cols) < len(old_cols): print(len(old_cols)-len(new_cols), " cols missing")
            if len(new_cols) > len(old_cols): print(len(new_cols)-len(old_cols), " cols added")

printing out the index of changed columns per row of a list

3 Answers3