2

I have two files that, for each byte read in one file, something has to be done with the byte at that same position in the other - Xor'ed say.

I was hoping that I could have something like below but I'm having no luck thus far:

c = 0

f1 = list(file1.read())
f2 = list(file2.read())


for (a, b) in f1, f2: # set a and b for each byte in turn in f1 and f2
    c = a ^ b

To me this initially felt quite Python like, but I'm beginning to doubt it now.

Any pointers very welcome!

peedurrr
  • 187
  • 16

1 Answers1

4

Use the zip() function:

for a, b in zip(f1, f2):

If the files are large, you'd probably want to use smaller chunks and not read the whole file into memory.

For text files, just loop directly over the files to yield lines:

try:
    from itertools import izip
except ImportError:  # Python 3, use builtin zip
    izip = zip

with file1, file2:
    for line1, line2 in izip(file1, file2):
        for a, b in izip(line1, line2):

where we use the izip() function instead (only in Python 2) to prevent the whole files being read first. This also assumes lines are the same length.

For binary files, read in chunks using a power-of-two chunksize:

file1_it = iter(file1, lambda f: f.read(2048))
file2_it = iter(file2, lambda f: f.read(2048))

for chunk1, chunk2 in izip(file1_it, file2_it):
    for a, b in izip(chunk1, chunk2):
thefourtheye
  • 233,700
  • 52
  • 457
  • 497
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I would like to add that the reason izip is not in python 3 is because in python 3, zip already works like izip. – kazagistar Jan 13 '14 at 16:42
  • @kazagistar: yes, and I didn't want to mask the built-in `zip()` in Python 2, hence the choice to rebind `zip` to `izip` in 3 rather than the other way around, rebinding `izip` to `zip` in Python 2. – Martijn Pieters Jan 13 '14 at 16:43