28

As part of some Python tests using the unittest framework, I need to compare two relatively short text files, where the one is a test output file and the other is a reference file.

The immediate approach is:

import filecmp
...
self.assertTrue(filecmp.cmp(tst_path, ref_path, shallow=False))

It works fine if the test passes, but in the even of failure, there is not much help in the output:

AssertionError: False is not true

Is there a better way of comparing two files as part of the unittest framework, so some useful output is generated in case of mismatch?

EquipDev
  • 5,573
  • 10
  • 37
  • 63

5 Answers5

25

To get a report of which line has a difference, and a printout of that line, use assertListEqual on the contents, e.g

self.assertListEqual(
    list(open(tst_path)),
    list(open(ref_path)))
Michael Mior
  • 28,107
  • 9
  • 89
  • 113
Ethan Bradford
  • 710
  • 8
  • 10
  • Under my understanding, this will leave the files open until the garbage-collector notices, which leaves the files locked for too long under Windows. Consider using context managers to limit the time the files are open. – Oddthinking Sep 04 '21 at 06:13
  • @Oddthinking probably something like: with open(...) as tst, open(...) as ref: ... - open those with with statement, it does list on them as well no need for io.open and such. should close once it leaves 'with' scope – MolbOrg Oct 09 '21 at 18:01
  • 2
    Yes, it's not a lot more complicated to include the auto-closing, e.g. with io.open(tst_path) as tst_f, io.open(ref_path) as ref_f: self.assertListEqual(list(tst_f), list(ref_f)) – Ethan Bradford Oct 12 '21 at 16:42
10

All you need to do is add your own message for the error condition. doc

self.assertTrue(filecmp(...), 'You error message')

Dan
  • 1,874
  • 1
  • 16
  • 21
  • 8
    A reminder for those who care: if the two files are different, it prints 'You error message' ONLY. – Tengerye Jun 18 '21 at 08:09
2

Comparing the files in the form of arrays bear meaningful assert errors:

assert [row for row in open(actual_path)] == [row for row in open(expected_path)]

You could use that each time you need to compare files, or put it in a function. You could also put the files in the forms of text string instead of arrays.

Adrien H
  • 643
  • 6
  • 21
  • 1
    in the event of multiple rows with mismatches, this will only report the first one. Not ideal. – Clint Eastwood Aug 23 '21 at 21:28
  • @ClintEastwood You can always join them I guess. Depending on your use case, it might be enough to fail with only one reported line. – Adrien H Aug 25 '21 at 07:52
0

Isn't it better to compare the content of the two files. For example if they are text files compare the text of the two files, this will output some more meaningful error message.

Bart
  • 496
  • 10
  • 23
  • 1
    The intention is to compare the contents, so I added ', shallow=False' to 'filecmp.cmp' to make that clear. – EquipDev Feb 28 '17 at 15:31
0

You can use the built-in difflib module for this.

Use the unified_diff format, which is plain text and will be empty if the contents of the files match. The file contents need to be read into lists first, and the return of unified_diff is a generator, so we wrap it in a list so we can inspect it. Here's a template you can use:

from difflib import unified_diff
with open("my/expected/file.txt, "r") as f:
  expected_lines = f.readlines()
with open("my/actual/file.txt, "r") as f:
  actual_lines = f.readlines()

diff = list(unified_diff(expected_lines, actual_lines))
assert diff == [], "Unexpected file contents:\n" + "".join(diff)

My only complaint here is that I wish I had colors. If you wanted them really badly, you could implement your own diff formatting based on the output of get_grouped_opcodes from the same module.

Nate Glenn
  • 6,455
  • 8
  • 52
  • 95