Compare 2 csv files and update columns

Question

I have 2 csv files with 2 rows 3 columns (id, name, value) that I want to compare. If there's a new row added to one of the files, the other one is updated as well. Likewise, if a value in one of the column changes the other file is updated.

Here's what I tried

a = '/path/to/file'
b = 'path/to/file'

with open(a, 'r') as f1, open(b, 'r') as f2:
    file1 = csv.DictReader(f1)
    file2 = csv.DictReader(f2)

    for row_new in file2:
        for row_old in file1:
            if row_new['id'] == row_old['id']:
                for k1, v1 in row_new.items():
                    for k, v in row_old.items():
                        if row_old[k1] == row_new[k]:
                            if v1 != v:
                                print(f'updated value for col {k}')
                                v1 = v
                            else:
                                print('Nothing to update')
            else:
                print(f'create row {row_new["id"]}')

I noticed that the iteration takes place only once. Am I doing something wrong here?

When you inspected/printed values and/or conditions at various points in your program was there an obvious place where it was misbehaving? If you are using an IDE **now** is a good time to learn its debugging features Or the built-in [Python debugger](https://docs.python.org/3/library/pdb.html). Printing *stuff* at strategic points in your program can help you trace what is or isn't happening. [What is a debugger and how can it help me diagnose problems?](https://stackoverflow.com/questions/25385173/what-is-a-debugger-and-how-can-it-help-me-diagnose-problems) — wwii, Dec 06 '20 at 14:52

wwii · Accepted Answer · 2020-12-06T15:23:25.527

I noticed that the iteration takes place only once...?

The inner loop is probably reaching the end of the file before the outer loop has a chance to make its next iteration. Try moving the file object's pointer back to the beginning after the inner loop stops.

with open(a, 'r') as f1, open(b, 'r') as f2:
    ...
    for row_new in file2:
        for row_old in file1:
            if row_new['id'] == row_old['id']:
                ...
            else:
                print(f'create row {row_new["id"]}')
        f1.seek(0)

Some would say that the nested for loops is what you are doing wrong. Here are some SO questions/answers to consider.

python update a column value of a csv file according to another csv file Python Pandas: how to update a csv file from another csv file
search python csv update one csv file based on another site:stackoverflow.com

Basically you should try to just read each file once and use data types that allow for fast membership testing like sets or dicts.

Your DictReaders will give you an {'id':x,'name':y,'value':z} dict for each row - causing you to use nested for loops to compare all the rows from one file to each row in the other. You could create a single dictionary using the id column for the keys and the dictionary values could be lists - {id:[name,value],id:[name,value],...} which may make the processing easier.

You also opened both your files for reading, open(...,'r'), so you'll probably find your files unchanged after you fix other things.

I run into an error `AttributeError: 'DictReader' object has no attribute 'seek'` — twingo, Dec 06 '20 at 15:21
Sorry, my bad. The error is `ValueError: I/O operation on closed file.` — twingo, Dec 06 '20 at 15:27
Fixed it. It was an indentation issue. Thank you very much ! — twingo, Dec 06 '20 at 15:31

Compare 2 csv files and update columns

1 Answers1