19

What is the pythonic way of iterating simultaneously over two lists?

Suppose I want to compare two files line by line (compare each ith line in one file with the ith line of the other file), I would want to do something like this:

file1 = csv.reader(open(filename1),...)
file2 = csv.reader(open(filename2),...)

for line1 in file1 and line2 in file2: #pseudo-code!
    if line1 != line2:
        print "files are not identical"
        break

What is the pythonic way of achieving this?


Edit: I am not using a file handler but rather a CSV reader (csv.reader(open(file),...)), and zip() doesn't seem to work with it...


Final edit: like @Alex M. suggested, zip() loads the files to memory on first iteration, so on big files this is an issue. On Python 2, using itertools solves the issue.

Yuval Adam
  • 161,610
  • 92
  • 305
  • 395
  • Possible duplicate of [How can I iterate through two lists in parallel in Python?](http://stackoverflow.com/questions/1663807/how-can-i-iterate-through-two-lists-in-parallel-in-python) – Ciro Santilli OurBigBook.com Jan 13 '17 at 10:37

3 Answers3

16

In Python 2, you should import itertools and use its izip:

with open(file1) as f1:
  with open(file2) as f2:
    for line1, line2 in itertools.izip(f1, f2):
      if line1 != line2:
        print 'files are different'
        break

with the built-in zip, both files will be entirely read into memory at once at the start of the loop, which may not be what you want. In Python 3, the built-in zip works like itertools.izip does in Python 2 -- incrementally.

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • This does the job! Indeed the problem was that the files were pretty large and `zip()` was loading them all to memory... – Yuval Adam Mar 06 '10 at 17:57
  • Ah, maybe that's why I see no difference. I'm using Python 3.1. – kennytm Mar 06 '10 at 17:58
  • @KennyTM, yep, no "maybe": in Python 3 many things that used to rely on all-in-memory lists in Python 2, have become incremental and iterative. So it's important to always clarify whether questions and answers relate to Python 2 or Python 3 -- in Python 2 the (better;-) incremental and iterative approach is, so to speak, "opt-in" (you need to get it explicitly), in Python 3 it's intrinsic (you need to explicitly call `list` in the relative rare cases where you actually **do** want a list, all in memory at once;-). – Alex Martelli Mar 06 '10 at 18:05
  • Just a small note: you can, if you want, `open` both files within the same `with` statement: `with open(file1) as f1, open(file2) as f2`: – Daan Timmer Jan 10 '14 at 08:47
11

I vote for using zip. The manual suggests "To loop over two or more sequences at the same time, the entries can be paired with the zip() function"

For example,

list_one = ['nachos', 'sandwich', 'name']
list_two = ['nachos', 'sandwich', 'the game']
for one, two in zip(list_one, list_two):
   if one != two:
      print "Difference found"
JAL
  • 21,295
  • 1
  • 48
  • 66
4

In lockstep (for Python ≥3):

for line1, line2 in zip(file1, file2):
   # etc.

As a "2D array":

for line1 in file1:
   for line2 in file2:
     # etc.
   # you may need to rewind file2 to the beginning.
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • Thanks, I am looking for the lockstep method. Any idea why this method doesn't work for a `csv.reader()`? – Yuval Adam Mar 06 '10 at 17:49
  • maybe you should clarify that for the "2D array" one might need to reinitialise the inner iterator... – fortran Mar 06 '10 at 17:54
  • 1
    @Yuval, please edit your answer to show exactly how you're trying to use zip with a (one?!) csv.reader -- this comment is totally mysterious. – Alex Martelli Mar 06 '10 at 17:54