2

I am trying to output the difference between two text files using the library difflib in Python 2, with the function HtmlDiff to generate an html file.

V1 = 'This has four words'
V2 = 'This has more than four words'

res = difflib.HtmlDiff().make_table(V1, V2)

text_file = open(OUTPUT, "w")
text_file.write(res)
text_file.close()

However the output html looks like this on a browser:

enter image description here

The display is comparing each single character, making it completely unreadable.

What should I modify for the comparison to be more human-friendly? (e.g. full sentences on each side)

If the input specifies "lines", then the output is also formatted respecting the lines, but it is not displaying the differences:

V1 = ['This has four words']
V2 = ['This has more than four words']

res = difflib.HtmlDiff().make_table(V1, V2)

text_file = open(OUTPUT, "w")
text_file.write(res)
text_file.close()

Resulting html (as viewed on a browser):

enter image description here

hirschme
  • 774
  • 2
  • 11
  • 40
  • You seem to be reading `V1` from the file opened with encoding utf-8, the read-reading the file into V1 opened without encoding. Are you sure you need both these? Same for V2? – DisappointedByUnaccountableMod May 25 '20 at 19:43
  • @barny thanks, that was a mistake, I removed the first file-opening lines – hirschme May 25 '20 at 19:46
  • 1
    OK well that probably explains the missing of utf-8 decoding - because you removed the encoding on the open. You are using Python 3, aren’t you? If you give your code simple plain ascii text to compare, does it produce better output? – DisappointedByUnaccountableMod May 25 '20 at 19:48
  • @barny that did solve the encoding problem, however the output still has the same problem. I updated the code to be clearer and easier to reproduce (this is python 2) – hirschme May 25 '20 at 20:15
  • “Not displaying the differences” - so you want markup? Try https://stackoverflow.com/questions/774316/python-difflib-highlighting-differences-inline – DisappointedByUnaccountableMod May 25 '20 at 20:38

3 Answers3

1

To get a markup you can use difflib.SequenceMatcher as in the function defined in this answer https://stackoverflow.com/a/788780/2318649

to get this code:

import difflib

def show_diff(seqm):
    # function from https://stackoverflow.com/questions/774316/python-difflib-highlighting-differences-inline
    """Unify operations between two compared strings
seqm is a difflib.SequenceMatcher instance whose a & b are strings"""
    output= []
    for opcode, a0, a1, b0, b1 in seqm.get_opcodes():
        if opcode == 'equal':
            output.append(seqm.a[a0:a1])
        elif opcode == 'insert':
            output.append("<ins>" + seqm.b[b0:b1] + "</ins>")
        elif opcode == 'delete':
            output.append("<del>" + seqm.a[a0:a1] + "</del>")
        elif opcode == 'replace':
            raise NotImplementedError( "what to do with 'replace' opcode?" )
        else:
            raise RuntimeError( f"unexpected opcode unknown opcode {opcode}" )
    return ''.join(output)


V1 = 'This has four words but fewer than eleven'
V2 = 'This has more than four words'


sm= difflib.SequenceMatcher(None, V1, V2)

html = "<html><body>"+show_diff(sm)+"</body></html>"

open("output.html","wt").write(html)

which produces:

enter image description here

  • This code fails in case of: V1 = 'This has 1 four words but fewer than eleven' V2 = 'This has 2 more than four words' because it hits the "replace" opcode... you can just replace the NotImplementedError for: output.append("" + seqm.b[b0:b1] + "") output.append("" + seqm.a[a0:a1] + "") – bruno.braga Jul 26 '23 at 10:40
  • OK, but it was clearly not implemented to handle that; you’re adding a new usecase, so you need different code. – DisappointedByUnaccountableMod Jul 27 '23 at 22:26
  • not sure I follow. this should work regardless. You have 2 strings to compare, that's all. – bruno.braga Aug 01 '23 at 12:19
0

The problem is you don't have the required styles. Try using make_file instead of make_table, then you'll see there is some CSS that will make the colors show up as you're expecting.

Randomibis
  • 61
  • 1
  • 3
0

this is an old question, but i have been struggling with it myself for a few days. I was getting this:

before fixing anything i finally pieced together something. looks like this:

html = difflib.HtmlDiff().make_file(a.split(' '), b.split(' '), fromdesc="original", todesc="modified")

after adding simple little split