1

This type of question has been asked several times but I cant seem to find the exact same scenario and be using python 3.(3.5 in my case)

I have two files txt or csv. I need to compare each row and output the differences to a new lines in a new file.

Here is what I have tried so far: which is close but I cant figure out how to make each row difference a new line, i seem to only be able to make each word a new line or everything on one line.

a = open('test1.txt').read().split()
b = open('test2.txt').read().split()
c = [x for x in b if x not in a]
open('test3.txt', 'wt').write('\n'.join(c)+'\n')

the \n before the .join makes every word a new row, I don't want each difference a new row, I want all the differences from one row on the same row. I hope that makes sense.

Example: test1.txt:

how are you
I am well
all is good

test2.txt:

how are you
I like toys
all is not well

output: test3.txt

am well
good

I have also tried this code for CSV: but I cant an error.

import csv

f1 = open ("test1.csv")
oldFile1 = csv.reader(f1)
oldList1 = []
for row in oldFile1:
    oldList1.append(row)

f2 = open ("test2.csv")
oldFile2 = csv.reader(f2)
oldList2 = []
for row in oldFile2:
    oldList2.append(row)

f1.close()
f2.close()

print [row for row in oldList1 if row not in oldList2]

I get this error: I think its related to me being on version 3.5 and this code was written for 2.7?

File "test3.py", line 18
    print [row for row in oldList1 if row not in oldList2]
                 ^
SyntaxError: Missing parentheses in call to 'print'

Thank you for your help

Mazdak
  • 105,000
  • 18
  • 159
  • 188
moore1emu
  • 476
  • 8
  • 27

3 Answers3

1

The problem with your first code is that you are splitting whole of your file, which will split your file by whitespace (not only new-line).

You can simply zip your splitted lines and compare the words together:

with open('test1.txt') as f1, open('test2.txt') as f2, open('result.txt', 'w') as f3:
    for line1, line2 in zip(f1, f2):
        sp1 = line1.split()
        sp2 = line2.split()
        f3.write(' '.join([i for i in sp1 if i not in sp2]) + '\n')
Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • its out putting everything on the same row. If I replace ' ' with '/n' its still not giving the correct row by row difference output – moore1emu May 18 '16 at 14:53
  • @moore1emu Fixed ;-), pleas consider accepting the answer if it works. – Mazdak May 18 '16 at 14:54
  • awesome, i was wondering if it need to be a + \n at the end. You are awesome sir, thank you for the quick response. Will this work the same if the file is a csv? – moore1emu May 18 '16 at 14:57
  • @moore1emu You're welcome, if you are dealing with `csv` its better to open the files using `csv` module, so that you don't need to split the lines the rest is the same. – Mazdak May 18 '16 at 15:00
  • here is a new wrinkle to my question, what if the rows do not line up, like if there are extra blank spaces or extra rows in one of the files. – moore1emu May 18 '16 at 16:04
  • 1
    @moore1emu In than case you better to zip your file objects `f1` & `f2` with `itertools.zip_longest` function that accepts a `fillvalue` argument in order to fill the missing items (here, lines ), then you can process them in any way you like. You can see a lot of examples in SO, related to this case. – Mazdak May 18 '16 at 16:12
0

Additionally you could have a look into using difflib if you need fancier output, for example. Here's a nice tutorial and a fitting question

Community
  • 1
  • 1
renefritze
  • 2,167
  • 14
  • 25
0

The problem with the second code is simply that 'print' works differently in python 2 and 3. if you just add a parentheses it should work, like this:

print([row for row in oldList1 if row not in oldList2])