I'm hoping to check if two html are different by tags only without considering the text and pick out those branch(es).
For example :
html_1 = """
<p>i love it</p>
"""
html_2 = """
<p>i love it really</p>
"""
They share the same tag structure, so they're seen to be the same. However:
html_1 = """
<div>
<p>i love it</p>
</div>
<p>i love it</p>
"""
html_2 = """
<div>
<p>i <em>love</em> it</p>
</div>
<p>i love it</p>
"""
I'd expect it to return the <div>
branch, because the tag structures are different. Could lxml
, BeautifulSoup
or some other lib achieve this? I'm trying to find a way to actually pick out the different branches.
Thanks