I am working on web scraping in Python using beautifulsoup. I am trying to extract text in bold or italics or both. Consider the following HTML snippet.
<div>
<b>
<i>
HelloWorld
</i>
</b>
</div>
If I use the command sp.find_all(['i', 'b'])
, understandably, I get two results, one corresponding to bold and the other to italics. i.e.
['< b>< i>HelloWorld< /i>< /b>', '< i>HelloWorld< /i>']
My question is, is there a way to uniquely extract it and get the tags?. My desired output is something like -
tag : text - HelloWorld, tagnames : [b,i]
Please note that comparing the text and weeding out non-unique occurrences of the text is not a feasible option, since I might have 'HelloWorld' repeated many times in the text, which I would want to extract.
Thanks!