0

I'm attempting to compare two CSV files (using import csv) and I'm getting unexpected results.

I'm able to properly parse the CSV and I'm getting the expected output I need to properly compare the two files based on their appropriate column data -- email address from one file to email address in the other file.

What I don't understand, is that when I run this, the first print row statement only shows the first entry in the CSV. If I move the print row to above or below the inner for loop, it properly iterates all lines.

Effectively it's only comparing the first line in csv1 into all lines in csv2.

csv1 is a subset of potential values from csv2.

 import csv
 csv1 = csv.reader(csv1)
 csv2 = csv.reader(csv2)

 for row in csv1:
   # 'print row' works fine here.
   for line in csv2:
           print row  #<----- First Print Row Statement
       if row[2].lower() == line[2].lower():
           print row
       elif row[2].replace('olddomain.com','newdomain.com') == line[2]:
           print row
   # 'print row' works fine here too.
Andrew
  • 968
  • 2
  • 11
  • 22
  • by the way, it seems that you do not really need a double for loop, since you only want to know if a row is `in` csv2, right? So you can use the `in` after converting `csv2=list(csv2)`. See below why you need the latter statement. – Ilja May 06 '16 at 14:02

2 Answers2

2

Do csv.reader? in ipython or somewhere to see a manpage ->

[...] the returned object is an iterator.

Not an iterable! Do csv2 = list(csv2) afterwards if you want expected behaviour :)

For explanation see python reference on the concept of iterables or the (globally leading ;)) 7430-upvote-answer about yield ...

Community
  • 1
  • 1
Ilja
  • 2,024
  • 12
  • 28
0

After passing through the inner loop for line in csv2 the first time, you've exhausted the generator csv2. So basically for the first row, the inner loop executes all the way through csv2. By the time you get to the second row, csv2 is essentially empty. So you never go into the loop to execute print on all the other rows.

A quick fix would be to load the rows of csv2 into a list before running the first for loop

csv2_list = [x for x in csv2]
for row in csv1:
   # 'print row' works fine here.
   for line in csv2_list:
      # print row should now work here

I suggest reading up on Python generators, they're essentially lists that generate data on the fly instead of having it all there waiting for you. So you only get to go through each generator once.

Edit: wouldn't recommend this solution if you have very large csv files. Instead iterate through both generators in the same loop as is shown in another solution below

Jad S
  • 2,705
  • 6
  • 29
  • 49