How to compare two csv files in Python

Question

I have two csv files. One is called 'Standard reg.csv', the other is 'Driver Details.csv'

In 'Standard reg.csv' the first two lines are:

['Day', 'Month', 'Year', 'Reg Plate', 'Hour', 'Minute', 'Second', 'Speed over limit']
['1', '1', '2016', 'NU16REG', '1', '1', '1', '5816.1667859699355']

The first two lines in Driver Details.csv are:

['FirstName', 'LastName', 'StreetAddress', 'City', 'Region', 'Country', 'PostCode', 'Registration']
['Violet', 'Kirby', '585-4073 Convallis Street', 'Balfour', 'Orkney', 'United Kingdom', 'OC1X 6QE', 'NU16REG']

My code is this:

import csv
file_1 = csv.reader(open('Standard Reg.csv', 'r'), delimiter=',')
file_2 = csv.reader(open('Driver Details.csv', 'r'), delimiter=',')
for row in file_1:
    reg = row[3]
    avgspeed = row[7]
    for row in file_2:
        firstname = row[0]
        lastname = row[1]
        address = row[2]
        city = row[3]
        region = row[4]
        reg2 = row[7]
if reg  == reg2:
    print('Match found')
else:
    print('No match found')

It's a work-in-progress, but I can't seem to get the code to compare more than just the last line.

With print(reg) after this line: reg2 = row[7]

it shows it has read that whole column. The entire column is also printed when I do print(reg2) after:reg2 = row[7]

But at if reg == reg2: it only reads the last lines of both columns and compares them and I'm not sure how to fix this.

Thank you in advance.

score 1 · Answer 1 · answered Feb 28 '16 at 18:55

1

The testing condition if reg == reg2 appears outside both loops (for file_1 and for file_2). That is why the testing is only done with the last line from each file.

Another problem is that you use the same loop variable row in both for loops.

answered Feb 28 '16 at 18:55

Sci Prog

2,651
1
10
18

i renamed for row in file_2 to for row2 in file_2 i thenk indented the if and else into oone of the loops then both and in the first indent it just repeated No match found twise then in the second one it repeated it many more (probably 103 times because the Driver Details has 101 lines in reg and standard has 2) and it didnt find a match – Tilak Feb 28 '16 at 19:07

Martin Evans · Accepted Answer · 2016-03-01T18:27:50.180

I suggest you first load all of the details from the Driver Details.csv into a dictionary, using the registration number as the key. This would then allow you to easily look up a given entry without having to keep reading all of the lines from the file again:

import csv

driver_details = {}

with open('Driver Details.csv') as f_driver_details:
    csv_driver_details = csv.reader(f_driver_details)
    header = next(csv_driver_details)       # skip the header

    for row in csv_driver_details:
        driver_details[row[7]] = row

with open('Standard Reg.csv') as f_standard_reg:
    csv_standard_reg = csv.reader(f_standard_reg)
    header = next(csv_standard_reg)     # skip the header

    for row in csv_standard_reg:
        try:
            driver = driver_details[row[3]]
            print('Match found - {} {}'.format(driver[0], driver[1]))
        except KeyError as e:
            print('No match found')

The code as you have it will iterate through file_2 and leave the file pointer either at the end (if no match is found) or at the location of a match (potentially missing matches earlier on for the next entry). For your approach to work you would have to start reading the file from the start for each loop, which would be very slow.

To add an output csv and display the full address you could do something like the following:

import csv

speed = 74.3
fine = 35

driver_details = {}

with open('Driver Details.csv') as f_driver_details:
    csv_driver_details = csv.reader(f_driver_details)
    header = next(csv_driver_details)       # skip the header

    for row in csv_driver_details:
        driver_details[row[7]] = row

with open('Standard Reg.csv') as f_standard_reg, open('Output log.csv', 'w', newline='') as f_output:
    csv_standard_reg = csv.reader(f_standard_reg)
    header = next(csv_standard_reg)     # skip the header
    csv_output = csv.writer(f_output)

    for row in csv_standard_reg:
        try:
            driver = driver_details[row[3]]
            print('Match found - Fine {}, Speed {}\n{} {}\n{}'.format(fine, speed, driver[0], driver[1], '\n'.join(driver[2:7])))
            csv_output.writerow(driver[0:7] + [speed, fine])
        except KeyError as e:
            print('No match found')

This would print the following:

Match found - Fine 35, Speed 74.3
Violet Kirby
585-4073 Convallis Street
Balfour
Orkney
United Kingdom
OC1X 6QE

And produce an output file containing:

Violet,Kirby,585-4073 Convallis Street,Balfour,Orkney,United Kingdom,OC1X 6QE,74.3,35

Thank you for the help but some parts i dont understand 1 is the header and what it does and 2nd to last line where it says ', e' as it throws a syntax error with that there but works without — Tilak, Mar 01 '16 at 17:51
The `header =` line is used to read the header in separately, if you `print(header)` you would see its contents. — Martin Evans, Mar 01 '16 at 17:54
I've made a change to the `except` line, so you could try that again. — Martin Evans, Mar 01 '16 at 17:55
thank you for the swift response , i need to annotate the lines but i am having some difficulty outputting the full address along with reg and speed into a csv file and printing a fine into the shell with the address it's going to. sorry for the extra work im quite new to python. — Tilak, Mar 01 '16 at 18:05

score 0 · Answer 3 · answered Feb 28 '16 at 20:36

Try csv.DictReader to eliminate most of your lines of code:

import csv
Violations = defaultdict(list)

# Read in the violations, there are probably less violations than drivers (I hope!)
with open('Standard reg.csv') as violations:
    for v in csv.DictReader(violations):
        Violations[v['Reg Plate']] = v

with open('Driver Details.csv') as drivers:
    for d in csv.DictReader(drivers):
        fullname = "{driver.FirstName} {driver.LastName}".format(driver=d)
        if d['Registration'] in Violations:
            count = len(Violations[d['Registration']])
            print("{fullname} has {count} violations.".format(fullname=fullname, count=count))
        else:
            print("{fullname} is too fast to catch!".format(fullname=fullname))

I wouldn't capitalize `Violations` because it's an instance rather than a class name. — pcurry, Feb 29 '16 at 06:35

How to compare two csv files in Python

3 Answers3

Linked