0

This might sound a little bit stupid but I have been having a hard time figuring it out. I have two text files and all I want to do is to compare each line of the first file with all of the lines of the second file. So far I just wanted to test a small part of my code which is:

for line1 in file1:
    print line1
    for line2 in file2:
        print line2

I thought this small code would give me a line from first file followed by all the lines from the second file. But the way it works is totally different. It gives me this:

in file 1 line 1
in file 2 line 1
in file 2 line 2
in file 2 line 3
in file 1 line 2

What I expect to see:

in file 1 line 1
in file 2 line 1
in file 2 line 2
in file 2 line 3

in file 1 line 2
in file 2 line 1
in file 2 line 2
in file 2 line 3

Any idea of what I might be doing wrong here?

PLEASE NOTE: I don't want to just compare the whole lines with each other to check if they are the same or not, I need to do some string operations before so the zip and stuff like that won't help me. Thanks

Thanks in advance

ahajib
  • 12,838
  • 29
  • 79
  • 120
  • Are you sure your indentation in your script is the same way you have it here? The code you posted should behave as you expected- print the first line of `file1`, print all of `file2`, print the second line of `file1`, print all of `file2`, etc. – MackM Sep 30 '14 at 18:20
  • @MackM Yes I am sure at this is the result I got: in file 1 line 1 in file 2 line 1 in file 2 line 2 in file 2 line 3 in file 1 line 2 – ahajib Sep 30 '14 at 18:23
  • @ddenhartog It is not because the last line which is the second line in my first file should be followed by all lines from the second file :) – ahajib Sep 30 '14 at 18:33
  • The output your script gives is what you described. Please post the output you expect/want. – MackM Sep 30 '14 at 18:36
  • I am not able to reproduce your problem, the code you provided is giving me the output you expect. Could you please post the contents of `file1` and `file2`? – MackM Sep 30 '14 at 18:53

2 Answers2

3

What has happened here is that a file is an iterator, and you have exhausted it (run out). You can see that by trying to loop over the same file twice:

>>> f2=open("CLI.md")
>>> for i in f2:
...     print(i)
... 
The CLI
(file contents...)
>>> for i in f2:
...     print(i)
... 
>>>

The best way of handling that here is to first convert the file in the inner loop to a list before looping:

file2_lines = list(file2)
for line1 in file1:
    print line1
    for line2 in file2_lines:
        print line2

Also see: exhausted iterators - what to do about them?

Community
  • 1
  • 1
matsjoyce
  • 5,744
  • 6
  • 31
  • 38
-1

zip may be your friend here.

For example,

for line_a, line_b in zip(file_1, file_2):
  #do something with your strings

Sample terminal code:

>>> file_1 = ['a', 'b', 'c', 'd']
>>> file_2 = ['a', 'one', 'c', 'd', 'e']
>>> for a, b in zip(file_1, file_2):
...   if a == b:
...     print('equal!')
...   else:
...     print('nope!')
... 
equal!
nope!
equal!
equal!
>>> for a, b in zip(file_2, file_1):
...   print(a, b)
... 
a a
one b
c c
d d

Notice anything strange?

As per the Python Docs "zip() should only be used with unequal length inputs when you don’t care about trailing, unmatched values from the longer iterables. If those values are important, use itertools.zip_longest() instead."

Douglas Denhartog
  • 2,036
  • 1
  • 16
  • 23
  • This did not answer the question, because OP did not want to consider corresponding file lines paired up, but instead wanted to consider each line from one file against each line of the other file (a cartesian product of lines from each source). – Karl Knechtel Jan 08 '23 at 09:38