34

When comparing similar lines, I want to highlight the differences on the same line:

a) lorem ipsum dolor sit amet
b) lorem foo ipsum dolor amet

lorem <ins>foo</ins> ipsum dolor <del>sit</del> amet

While difflib.HtmlDiff appears to do this sort of inline highlighting, it produces very verbose markup.

Unfortunately, I have not been able to find another class/method which does not operate on a line-by-line basis.

Am I missing anything? Any pointers would be appreciated!

AnC
  • 4,099
  • 8
  • 43
  • 69

3 Answers3

53

For your simple example:

import difflib
def show_diff(seqm):
    """Unify operations between two compared strings
seqm is a difflib.SequenceMatcher instance whose a & b are strings"""
    output= []
    for opcode, a0, a1, b0, b1 in seqm.get_opcodes():
        if opcode == 'equal':
            output.append(seqm.a[a0:a1])
        elif opcode == 'insert':
            output.append("<ins>" + seqm.b[b0:b1] + "</ins>")
        elif opcode == 'delete':
            output.append("<del>" + seqm.a[a0:a1] + "</del>")
        elif opcode == 'replace':
            raise NotImplementedError("what to do with 'replace' opcode?")
        else:
            raise RuntimeError("unexpected opcode")
    return ''.join(output)

>>> sm= difflib.SequenceMatcher(None, "lorem ipsum dolor sit amet", "lorem foo ipsum dolor amet")
>>> show_diff(sm)
'lorem<ins> foo</ins> ipsum dolor <del>sit </del>amet'

This works with strings. You should decide what to do with "replace" opcodes.

Josiah Yoder
  • 3,321
  • 4
  • 40
  • 58
tzot
  • 92,761
  • 29
  • 141
  • 204
  • Thanks very much for this! That's exactly the kind of sample I needed. I had no idea how to get started, but this illustrates it very well. Again, many thanks! – AnC Apr 25 '09 at 14:39
  • +1 thanks for your example :) What would you suggest to do with replace optcodes? – Viet Oct 27 '12 at 08:13
  • Well, one suggestion would be to discover some 'replace' opcodes in the wild; the documentation says they can be produced, but I don't remember ever seeing any (IIRC I've only seen 'delete's followed by 'insert's). In any case, what to do with 'replace's is up to the OP. – tzot Oct 28 '12 at 00:34
  • 1
    the example on https://docs.python.org/2/library/difflib.html#difflib.SequenceMatcher.get_opcodes contains replace opcodes. – Tom Nov 05 '14 at 05:28
  • 2
    For replace optcodes, I just appended both the action for insert, and delete's value. – ThorSummoner Jun 02 '15 at 16:36
9

Here's an inline differ inspired by @tzot's answer above (also Python 3 compatible):

def inline_diff(a, b):
    import difflib
    matcher = difflib.SequenceMatcher(None, a, b)
    def process_tag(tag, i1, i2, j1, j2):
        if tag == 'replace':
            return '{' + matcher.a[i1:i2] + ' -> ' + matcher.b[j1:j2] + '}'
        if tag == 'delete':
            return '{- ' + matcher.a[i1:i2] + '}'
        if tag == 'equal':
            return matcher.a[i1:i2]
        if tag == 'insert':
            return '{+ ' + matcher.b[j1:j2] + '}'
        assert False, "Unknown tag %r"%tag
    return ''.join(process_tag(*t) for t in matcher.get_opcodes())

It's not perfect, for example, it would be nice to expand 'replace' opcodes to recognize the full word replaced instead of just the few different letters, but it's a good place to start.

Sample output:

>>> a='Lorem ipsum dolor sit amet consectetur adipiscing'
>>> b='Lorem bananas ipsum cabbage sit amet adipiscing'
>>> print(inline_diff(a, b))
Lorem{+  bananas} ipsum {dolor -> cabbage} sit amet{-  consectetur} adipiscing
funnydman
  • 9,083
  • 4
  • 40
  • 55
orip
  • 73,323
  • 21
  • 116
  • 148
  • I like how you process the 'replace' options. – Josiah Yoder Aug 30 '22 at 15:47
  • 1
    I also like that you translated this to Python 3. I have now translated the original answer by tzot to Python 3 as well. – Josiah Yoder Aug 30 '22 at 15:47
  • But is it really clearer to replace a loop with nested if with a method called in a Python comprehension? – Josiah Yoder Aug 30 '22 at 15:52
  • 1
    @JosiahYoder fair question about the for loop vs comprehension. Since it's a style issue I'm unsure if there's a definitive answer beyond personal preference. – orip Aug 30 '22 at 18:10
3

difflib.SequenceMatcher will operate on single lines. You can use the "opcodes" to determine how to change the first line to make it the second line.

Adam
  • 803
  • 1
  • 8
  • 11
  • 1
    I'm afraid I don't quite understand this - yet anyway, so I'll do more digging. Thanks. – AnC Apr 22 '09 at 18:55
  • What exactly are you trying to do with the differences? Do you want HTML output or were you just using the HtmlDiff because it did in-line diffing? – Adam Apr 22 '09 at 23:15
  • While HTML output is my primary use case, HtmlDiff's output doesn't allow for easy reuse - that is, if it were simply inserting INS and DEL, that could then easily be transformed to whatever is needed further down the line. – AnC Apr 23 '09 at 13:12