lxml can do something similar to what you want. From the docs:
>>> from lxml.html.diff import htmldiff
>>> doc1 = '''<p>Here is some text.</p>'''
>>> doc2 = '''<p>Here is <b>a lot</b> of <i>text</i>.</p>'''
>>> print htmldiff(doc1, doc2)
<p>Here is <ins><b>a lot</b> of <i>text</i>.</ins> <del>some text.</del> </p>
I don't know of any other Python library for this specific task, but you may want to look into word-by-word diffs. They may approximate what you want.
One example is this one, implemented in both PHP and Python (save it as diff.py
, then import diff
)
>>> diff.htmlDiff(a,b)
>>> '<del><p>i</del> <ins><h2>i</ins> love <del>it</p></del> <ins>it </p></ins>'