Python difflib: highlighting differences inline?

Question

When comparing similar lines, I want to highlight the differences on the same line:

a) lorem ipsum dolor sit amet
b) lorem foo ipsum dolor amet

lorem <ins>foo</ins> ipsum dolor <del>sit</del> amet

While difflib.HtmlDiff appears to do this sort of inline highlighting, it produces very verbose markup.

Unfortunately, I have not been able to find another class/method which does not operate on a line-by-line basis.

Am I missing anything? Any pointers would be appreciated!

score 53 · Accepted Answer · edited Aug 30 '22 at 15:45

53

For your simple example:

import difflib
def show_diff(seqm):
    """Unify operations between two compared strings
seqm is a difflib.SequenceMatcher instance whose a & b are strings"""
    output= []
    for opcode, a0, a1, b0, b1 in seqm.get_opcodes():
        if opcode == 'equal':
            output.append(seqm.a[a0:a1])
        elif opcode == 'insert':
            output.append("<ins>" + seqm.b[b0:b1] + "</ins>")
        elif opcode == 'delete':
            output.append("<del>" + seqm.a[a0:a1] + "</del>")
        elif opcode == 'replace':
            raise NotImplementedError("what to do with 'replace' opcode?")
        else:
            raise RuntimeError("unexpected opcode")
    return ''.join(output)

>>> sm= difflib.SequenceMatcher(None, "lorem ipsum dolor sit amet", "lorem foo ipsum dolor amet")
>>> show_diff(sm)
'lorem<ins> foo</ins> ipsum dolor <del>sit </del>amet'

This works with strings. You should decide what to do with "replace" opcodes.

edited Aug 30 '22 at 15:45

Josiah Yoder

3,321
4
40
58

answered Apr 25 '09 at 12:05

tzot

92,761
29
141
204

Thanks very much for this! That's exactly the kind of sample I needed. I had no idea how to get started, but this illustrates it very well. Again, many thanks! – AnC Apr 25 '09 at 14:39
+1 thanks for your example :) What would you suggest to do with replace optcodes? – Viet Oct 27 '12 at 08:13
Well, one suggestion would be to discover some 'replace' opcodes in the wild; the documentation says they can be produced, but I don't remember ever seeing any (IIRC I've only seen 'delete's followed by 'insert's). In any case, what to do with 'replace's is up to the OP. – tzot Oct 28 '12 at 00:34
1

the example on https://docs.python.org/2/library/difflib.html#difflib.SequenceMatcher.get_opcodes contains replace opcodes. – Tom Nov 05 '14 at 05:28
2

For replace optcodes, I just appended both the action for insert, and delete's value. – ThorSummoner Jun 02 '15 at 16:36

score 9 · Answer 2 · edited Mar 02 '20 at 10:25

9

Here's an inline differ inspired by @tzot's answer above (also Python 3 compatible):

def inline_diff(a, b):
    import difflib
    matcher = difflib.SequenceMatcher(None, a, b)
    def process_tag(tag, i1, i2, j1, j2):
        if tag == 'replace':
            return '{' + matcher.a[i1:i2] + ' -> ' + matcher.b[j1:j2] + '}'
        if tag == 'delete':
            return '{- ' + matcher.a[i1:i2] + '}'
        if tag == 'equal':
            return matcher.a[i1:i2]
        if tag == 'insert':
            return '{+ ' + matcher.b[j1:j2] + '}'
        assert False, "Unknown tag %r"%tag
    return ''.join(process_tag(*t) for t in matcher.get_opcodes())

It's not perfect, for example, it would be nice to expand 'replace' opcodes to recognize the full word replaced instead of just the few different letters, but it's a good place to start.

Sample output:

>>> a='Lorem ipsum dolor sit amet consectetur adipiscing'
>>> b='Lorem bananas ipsum cabbage sit amet adipiscing'
>>> print(inline_diff(a, b))
Lorem{+  bananas} ipsum {dolor -> cabbage} sit amet{-  consectetur} adipiscing

edited Mar 02 '20 at 10:25

funnydman

9,083
4
40
55

answered Dec 03 '17 at 10:50

orip

73,323
21
116
148

I like how you process the 'replace' options. – Josiah Yoder Aug 30 '22 at 15:47
1

I also like that you translated this to Python 3. I have now translated the original answer by tzot to Python 3 as well. – Josiah Yoder Aug 30 '22 at 15:47
But is it really clearer to replace a loop with nested if with a method called in a Python comprehension? – Josiah Yoder Aug 30 '22 at 15:52
1

@JosiahYoder fair question about the for loop vs comprehension. Since it's a style issue I'm unsure if there's a definitive answer beyond personal preference. – orip Aug 30 '22 at 18:10

score 3 · Answer 3 · answered Apr 21 '09 at 20:04

3

difflib.SequenceMatcher will operate on single lines. You can use the "opcodes" to determine how to change the first line to make it the second line.

answered Apr 21 '09 at 20:04

Adam

803
1
8
11

1

I'm afraid I don't quite understand this - yet anyway, so I'll do more digging. Thanks. – AnC Apr 22 '09 at 18:55
What exactly are you trying to do with the differences? Do you want HTML output or were you just using the HtmlDiff because it did in-line diffing? – Adam Apr 22 '09 at 23:15
While HTML output is my primary use case, HtmlDiff's output doesn't allow for easy reuse - that is, if it were simply inserting INS and DEL, that could then easily be transformed to whatever is needed further down the line. – AnC Apr 23 '09 at 13:12

Python difflib: highlighting differences inline?

3 Answers3

Linked