Assuming I have this simple html:
<html>
<body>
<!--[if !mso]><!-->
<a href="http://link1.com">Link 1</a>
<!--<![endif]-->
<!--[if mso]>
<a href="http://link2.com">Link 2</a>
<![endif]-->
</body>
</html>
Is there a way to use lxml.html
or BeautifulSoup
to get both links? Currently I get only one. In other words, I want the parser to look into html conditional comments also (not sure what the technical term is).
lxml.html
>>> from lxml import html
>>> doc = html.fromstring(s)
>>> list(doc.iterlinks())
<<< [(<Element a at 0x10f7f7bf0>, 'href', 'http://link1.com', 0)]
BeautifulSoup
>>> from BeautifulSoup import BeautifulSoup
>>> b = BeautifulSoup(s)
>>> b.findAll('a')
<<< [<a href="http://link1.com">Link 1</a>]